Regulation of prokaryotic gene expression with zinc finger proteins

ABSTRACT

Chimeric zinc finger proteins, and methods of using zinc finger proteins for regulating gene expression in prokaryotes are disclosed herein.

BACKGROUND

[0001] Most genes are regulated at the transcriptional level bypolypeptide transcription factors that bind to specific DNA sites withinthe gene, typically in promoter or enhancer regions. These proteinsactivate or repress transcriptional initiation by RNA polymerase at thepromoter, thereby regulating expression of the target gene. Manytranscription factors, both activators and repressors, includestructurally distinct domains that have specific functions, such as DNAbinding, dimerization, or interaction with the transcriptionalmachinery. The DNA binding portion of the transcription factor itselfcan be composed of independent structural domains that contact DNA. Thethree-dimensional structures of many DNA-binding domains, including zincfinger domains, homeodomains, and helix-turn-helix domains, have beendetermined from NMR and X-ray crystallographic data. Effector domainssuch as activation domains or repression domains retain their functionwhen transferred to DNA-binding domains of heterologous transcriptionfactors (Brent and Ptashne, (1985) Cell 43:729-36; Dawson et al., (1995)Mol. Cell Biol. 15:6923-31).

[0002] Artificial transcription factors can be produced that arechimeras of zinc finger domains. For example, WO 01/60970 (Kim et al.)describes methods for determining the specificity of zinc finger domainsand for constructing artificial transcription factors that recognizeparticular target sites.

[0003] In bacteria, genes are grouped into operons, which are geneclusters that encode the proteins necessary to perform coordinatedfunction, such as biosynthesis of a given amino acid. RNA that istranscribed from a prokaryotic operons is polycistronic, such thatmultiple proteins are encoded in a single transcript. Gene expression inbacteria can be controlled at the level of transcription initiation,which is regulated by DNA sequence elements upstream of the site oftranscriptional initiation that are recognized and contacted by RNApolymerase. RNA polymerase can be regulated, in turn, by interactionwith accessory proteins, which can act both positively (activators) andnegatively (repressors). The mechanisms by which transcription isregulated in prokaryotes are thought to be less complex than thoseobserved in eukaryotic organisms.

SUMMARY

[0004] The invention provides methods and compositions for regulatinggene expression in prokaryotes. In one aspect, the invention features amethod of regulating expression of a gene in a prokaryotic cell, themethod including: providing a prokaryotic cell comprising a nucleic acidencoding an polypeptide (e.g., an artificial, chimeric polypeptide),wherein the polypeptide comprises a zinc finger domain, and wherein thepolypeptide binds to a target DNA site in a gene; expressing the nucleicacid encoding the polypeptide in the cell under conditions in which thepolypeptide is produced, binds to the target DNA site, and regulates thegene.

[0005] The artificial polypeptide can include two, three, four, five,six, or more zinc finger domains. In one embodiment, the artificialpolypeptide includes three zinc finger domains. In one embodiment, theartificial polypeptide includes four zinc finger domains. In oneembodiment, the artificial polypeptide includes five or more zinc fingerdomains.

[0006] The zinc finger domain or domains of the artificial polypeptidecan be naturally-occurring zinc finger domains or variants thereof. Inone embodiment, each zinc finger domain of the artificial polypeptide isidentical to a naturally-occurring zinc finger domain. In oneembodiment, the artificial polypeptide includes a first zinc fingerdomain that is identical to a naturally-occurring zinc finger domain,and a second zinc finger domain that is a variant of anaturally-occurring zinc finger domain.

[0007] In one embodiment, the artificial polypeptide includes two zincfinger domains, wherein each of the two zinc finger domains is identicalto a zinc finger domain of a same naturally-occurring protein, or avariant thereof. In one embodiment, the artificial polypeptide includestwo zinc finger domains, wherein the each of the zinc finger domains isidentical to a zinc finger domain of a different naturally-occurringprotein, or a variant thereof. In one embodiment, the artificialpolypeptide includes two zinc finger domains, and each of the two zincfinger domains is identical to a non-adjacent zinc finger domain of asame naturally-occurring protein.

[0008] The artificial polypeptide can include one or more of thefollowing features:

[0009] the artificial polypeptide regulates expression of an endogenousgene; the artificial polypeptide regulates expression of an exogenous(e.g., heterologous) gene; the artificial polypeptide regulatesexpression of a phage gene; the artificial polypeptide regulatesexpression of a transposon gene; the artificial polypeptide has adissociation constant for a DNA site of less than 50 nM; the artificialpolypeptide includes one or more zinc finger domains, wherein the DNAcontacting residues of one or more of the zinc finger domains atpositions −1, +2, +3, and +6 correspond to an amino acid motif selectedfrom the following: RSHR, HSSR, ISNR, RDHT, QTHR, VSTR, QNTQ, CSNR,QSHV, VSNV, QSNK, QSSR, WSNR, DSAR, QTHQ, QSNR, and CSNR. In oneembodiment, the non-DNA contacting residues are identical to a set ofnon-DNA contacting residues described herein. For example, the zincfinger domain can include a zinc finger domain from Table 1. TABLE 1 ZFDAmino Acid Sequence SEQ ID NO: H1.1 YKCMECGKAFNRRSHLTRHQRIH 1 H1.2FKCPVCGKAFRHSSSLVRHQRTH 2 H1.3 YRCKYCDRSFSISSNLQRHVRNIH 3 H2.1YTCSYCGKSF TQSNTLKQHTRIH 4 H2.2 YKCKQCGKAFGCPSNLRRHGRTH 5 H2.3YRCKYCDRSFSISSNLQRHVRNIH 6 H3.1 YRCKYCDRSFSISSNLQRHVRNIH 7 H3.2FQCKTCQRKFSRSDHLKTHTRTH 8 H3.3 YECHDCGKSFRQSTHLTRHRRIH 9 H3.4YECNYCGKTFSVSSTLIRHQRIH 10 T1.1 YECDHCGKSFSQSSHLNVHKRTH 11 T1.2YECDHCGKAFSVSSNLNVHRRIH 12 T1.3 YKCEECGKAFTQSSNLTKHKKIH 13 T1.4YKCEECGKAFTQSSNL TKHKKIH 14 T2.1 FQCKTCQRKFSRSDHLKTHTRTH 15 T2.2 YECDHCGKSFSQSSHLNVHKRTH 16 T2.3 YECHDCGKSFRQSTHLTRHRRIH 17 T2.4YKCPDCGKSFSQSSSLIRHQRTH 18 T3.1 YRCEECGKAFRWPSNLTRHKRIH 19 T3.2YECDHCGKSFSQSSHLNVHKRTH 20 T3.3 YECDHCGKAFSVSSNLNVHRRIH 21 T3.4YECDHCGKSFSQSSHLNVHKRTH 22 T4.1 YECHDCGKSFRQSTHLTRHRRIH 23 T4.2YKCMECGKAFNRRSHLTRHQRIH 24 T4.3 YECHDCGKSFRQSTHLTRHRRIH 25 T4.4YECHDCGKSFRQSTHLTRHRRIH 26 T5.1 FMCTWSYCGKRFTDRSALARHKRTH 27 T5.2FQCKTCQRKFSRSDHLKTHTRTH 28 T5.3 YECDHCGKSFSQSSHLNVHKRTH 29 T5.4 YECHDCGKSFRQSTHLTRHRRIH 30 T6.1 YECHDCGKSFRQSTHLTQHRRIH 31 T6.2YKCMECGKAFNRRSHLTRHQRIH 32 T6.3 YECHDCGKSFRQSTHLTRHRRIH 33 T6.4 YECHDCGKSFRQSTHL TRHRRIH 34 T7.1 YECDHCGKSFSQSSHLNVHKRTH 35 T7.2YECDHCGKAFSVSSNLNVHRRIH 36 T7.3 FECKDCGKAFIQKSNLRHQRTH 37 T7.4YKCKQCGKAFGCPSNLRRHGRTH 38 T8.1 YECDHCGKAFSVSSNLNVHRRIH 39 T8.2YECHDCGKSFRQSTHLTRHRRIH 40 T8.3 YKCPDCGKSFSQSSSLIRHQRTH 41 T8.4FQCKTCQRKFSRSDHL KTHTRTH 42 T9.1 FQCKTCQRKFSRSDHLKTHTRTH 43 T9.2YECDHCGKSFSQSSHLNVHKRTH 44 T9.3 YECHDCGKSFRQSTHLTRHRRIH 45 T9.4FECKDCGKAFIQKSNLIRHQRTH 46 T10.1 FMCTWSYCGKRFTDRSALARHKRTH 47 T10.2FQCKTCQRKFSRSDHLKTHTRTH 48 T10.3 YKCEECGKAFTQSSNLTKHKKIH 49 T10.4YECHDCGKSFRQSTHLTRHRRIH 50

[0010] The artificial polypeptide can include an amino acid sequencethat differs by 1 to 8 amino acid substitutions, deletions, orinsertions from a sequence in Table 1. The substitution may be at aposition other than a DNA contacting residue, e.g., between ametal-coordinating cysteine and position −1. The substitutions can beconservative substitutions.

[0011] In one embodiment, the artificial polypeptide includes one ormore of the zinc finger domains shown in Table 1.

[0012] In one embodiment, the artificial polypeptide includes an aminoacid sequence at least 75%, 80%, 85%, 90%, 95%, 99%, or 100% identicalto a sequence of a zinc finger protein in Table 2. TABLE 2 ZFP Aminoacid Sequence SEQ ID NO: H1YKCMECGKAFNRRSHLTRHQRIHTGEKPFKCPVCGKAFRHSSSLVRHQRT 51HTGEKPYRCKYCDRSFSISSNLQRHVRNIH H2YTCSYCGKSFTQSNTLKQHTRIHTGEKPYKCKQCGKAFGCPSNLRRHGRT 52HTGEKPYRCKYCDRSFSISSNLQRHVRNIH H3YRCKYCDRSFSISSNLQRHVRNIHTGEKPFQCKTCQRKFSRSDHLKTHTRT 53HTGEKPYECHDCGKSFRQSTHLTRHRRIHTGEKPYECNYCGKTFSVSSTLI RHQRIH T1YECDHCGKSFSQSSHLNVHKRTHTGEKPYECDHCGKAFSVSSNLNVHRRI 54HTGEKPYKCEECGKAFTQSSNLTKHKKIHTGEKPYKCEECGKAFTQSSNL TKHKKIH T2FQCKTCQRKFSRSDHLKTHTRTHTGEKPYECDHCGKSFSQSSHLNVHKRT 55HTGEKPYECHDCGKSFRQSTHLTRHRRIHTGEKPYKCPDCGKSFSQSSSLI RHQRTH T3YRCEECGKAFRWPSNLTRHKRIHTGEKPYECDHCGKSFSQSSHLNVHKRT 56HTGEKPYECDHCGKAFSVSSNLNVHRRIHTGEKPYECDHCGKSFSQSSHL NVHKRTH T4YECHDCGKSFRQSTHLTRHRRIHTGEKPYKCMECGKAFNRRSHLTRHQRI 57HTGEKPYECHDCGKSFRQSTHLTRHRRIHTGEKPYECHDCGKSFRQSTHL TRHRRIH T5FMCTWSYCGKRFTDRSALARHKRTHTGEKPFQCKTCQRKFSRSDHLKTH 58TRTHTGEKPYECDHCGKSFSQSSHLNVHKRTHTGEKPYECHDCGKSFRQS THLTRHRRIH T6YECHDCGKSFRQSTHLTQHRRIHTGEKPYKCMECGKAFNRRSHLTRHQRI 59HTGEKPYECHDCGKSFRQSTHLTRHRRIHTGEKPYECHDCGKSFRQSTHL TRHRRIH T7YECDHCGKSFSQSSHLNVHKRTHTGEKPYECDHCGKAFSVSSNLNVHRRI 60HTGEKPFECKDCGKAFIQKSNLIRHQRTHTGEKPYKCKQC GKAFGCPSNL RRHGRTH T8YECDHCGKAFSVSSNLNVHRRIHTGEKPYECHDCGKSFRQSTHLTRHRRI 61HTGEKPYKCPDCGKSFSQSSSLIRHQRTHTGEKPFQCKTCQRKFSRSDHL KTHTRTH T9FQCKTCQRKFSRSDHLKTHTRTHTGEKPYECDHCGKSFSQSSHLNVHKRT 62HTGEKPYECHDCGKSFRQSTHLTRHRRIHTGEKPFECKDCGKAFIQKSNL IRHQRTH T10FMCTWSYCGKRFTDRSALARHKRTHTGEKPFQCKTCQRKFSRSDHLKTH 63TRTHTGEKPYKCEECGKAFTQSSNLTKHKKIHTGEKPYECHDCGKSFRQS T HLTRHRRIH

[0013] The artificial polypeptide can include an epitope tag, e.g., a V5epitope tag (e.g., having the following amino acid sequence:GKPIPNPLLGLDS (SEQ ID NO:64)).

[0014] In one embodiment, the artificial polypeptide binds within 50,40, 30, 20, or 10 nucleotides of a −35 or −10 element of a prokaryoticgene. In one embodiment, the artificial polypeptide binds atranscription factor binding site or binds a site that overlaps atranscription factor binding site.

[0015] Expression of the nucleic acid encoding the artificialpolypeptide can be regulatable, e.g., by operably linking the sequenceencoding the artificial polypeptide to a regulatable promoter.Regulatable promoters include promoters responsive to thermal changes,hormones, metals, metabolites, antibiotics, or chemical agents. In oneembodiment, expression of the nucleic acid encoding the artificialpolypeptide is regulatable with IPTG (e.g., the sequence encoding theartificial polypeptide is operably linked to a lac promoter).

[0016] The artificial polypeptide can include other features describedherein.

[0017] In one embodiment, the artificial polypeptide regulatesexpression of an endogenous gene (e.g., directly or indirectly). In oneembodiment, the artificial polypeptide regulates expression of two,three, four, or more endogenous genes. In one embodiment, the artificialpolypeptide regulates expression of one or more endogenous genes bymodulating transcription of a polycistronic RNA.

[0018] The method can further include characterizing the endogenousgene. For example, DNA comprising the target DNA site of the artificialpolypeptide can be isolated (e.g., by cross-linking the artificialprotein to the DNA, immunoprecipitating the artificial protein, andisolating the DNA associated with the protein), and nucleotidesassociated with the target DNA site can be sequenced. A gene associatedwith the target DNA site can be identified. The method can furtherinclude identifying a homolog of the endogenous gene in a second cell,and regulating the expression of the homolog in the second cell. Thesecond cell can be a prokaryotic cell or a eukaryotic cell.

[0019] In one embodiment, the artificial polypeptide regulatesexpression of a heterologous gene. In one embodiment, the artificialpolypeptide regulates expression of two, three, or more heterologousgenes.

[0020] In one embodiment, the artificial polypeptide includes atranscriptional activation domain. In one embodiment, the artificialpolypeptide includes a transcriptional repression domain.

[0021] In one embodiment, expression of the gene is repressed (e.g.,relative to expression of the gene in the absence of the artificialprotein, or relative to a reference value). In one embodiment,expression of the gene is activated (e.g., relative to expression of thegene in the absence of the artificial protein, or relative to areference value).

[0022] In one embodiment, the cell is a bacterial cell, e.g., an E. colicell. The cell can be any prokaryotic cell, e.g., a Gram-negativebacterial cell, a Gram-positive bacterial cell, a pathogenic bacterialcell, a non-pathogenic bacterial cell (e.g., a commensal bacterialcell). The cell can be selected from a cell of one of the followingspecies: Mycobacterium spp. (e.g., Mycobacterium tuberculosis,Mycobacterium leprae), Lactobacillus spp., Streptococcus spp. (e.g.,Streptococcus pneumoniae, Streptococcus pyogenes), Staphylococcus spp.(e.g., Staphylococcus aureus), Bacillus spp. (e.g., Bacillus subtilis,Bacillus anthracis), Campylobacter spp., Pseudomonas spp. (e.g.,Pseudomonas aeruginosa), Clostridium spp. (e.g., Clostridium tetani,Clostridium botulinum, Clostridium perfringens), Salmonella spp. (e.g.,Salmonella typhi), Corynebacteria spp. (e.g., Corynebacteriadiphtheriae), Escherichia spp. (e.g., Escherichia coli), and Listeriaspp. (e.g., Listeria monocytogenes), Streptomyces spp., and Thermobifidaspp.

[0023] A plurality of cells can be provided.

[0024] The regulating can alter a trait of the cell relative to areference cell, e.g., a cell that does not express the artificialpolypeptide. The trait can be any detectable phenotype, e.g., aphenotype that can be observed, selected, inferred, and/or quantitated.Traits include: heat resistance, solvent resistance, heavy metalresistance, osmolarity resistance, resistance to extreme pH, chemicalresistance, cold resistance, and resistance to a genotoxic agent,resistance to radioactivity.

[0025] For example, the trait is resistance to an environmentalcondition, e.g., heavy metals, salinity, environmental toxins,biological toxins, pathogens, parasites, other environmental extremes(e.g., desiccation, heat, cold), and so forth. In a related example, thetrait is stress resistance (e.g., to heat, cold, extreme pH, chemicals,such as ammonia, drugs, osmolarity, and ionizing radiation). In yetanother example, the trait is drug resistance. The change in the traitcan be in either direction, e.g., towards sensitivity or furtherresistance.

[0026] In one embodiment, the artificial polypeptide regulatesexpression of an endogenous gene which is a decarboxylase enzyme. In oneembodiment, the decarboxylase enzyme is a decarboxylase enzyme of aubiquinone biosynthetic pathway, e.g., a ubiX gene product of E. coli.

[0027] In another aspect, the invention features a method including:providing a plurality of prokaryotic cells, wherein each cell of theplurality comprises a nucleic acid encoding an artificial polypeptide,wherein the artificial polypeptide comprises a zinc finger domain, andwherein the artificial polypeptide differs among the cells of theplurality; and, identifying from the plurality a cell that has a traitthat is altered relative to a reference cell. The reference cell can bea cell that does not include a nucleic acid encoding the artificialpolypeptide, e.g., the reference cell is a parental cell from which theplurality of cells was made, or a derivative thereof.

[0028] The trait can be any detectable phenotype, e.g., a phenotype thatcan be observed, selected, inferred, and/or quantitated. The artificialpolypeptide can be a chimeric polypeptide. As used herein, a chimericpolypeptide includes at least two binding domains that are heterologousto each other (e.g., two zinc finger domains). The two binding domainscan be from different naturally occurring proteins. The artificialpolypeptide can include one or more features described herein.

[0029] In many embodiments, the cell does not include a reporter gene.In other words, the cells can be screened without having, a priori,information about a target gene whose regulation is altered byexpression of the chimeric polypeptide. In addition, the cell mayinclude a reporter gene as an additional indicator of a marker that isrelated or unrelated to the trait. Likewise, one or more target genesmay be known prior to the screening.

[0030] In another example, the trait is production of a compound (e.g.,a natural or artificial compound.

[0031] The trait can be resistance to an environmental condition, e.g.,heavy metals, salinity, environmental toxins, biological toxins,pathogens, parasites, other environmental extremes (e.g., desiccation,heat, cold), and so forth. In a related example, the trait is stressresistance (e.g., to heat, cold, extreme pH, chemicals, such as ammonia,drugs, osmolarity, and ionizing radiation). In yet another example, thetrait is drug resistance. The change in the trait can be in eitherdirection, e.g., towards sensitivity or further resistance.

[0032] In one embodiment, the trait is tolerance to an organic solvent,and the identifying comprises exposing cells of the plurality to theorganic solvent and evaluating survival of the cells. In one embodiment,the trait is heat tolerance, and the evaluating comprises exposing thecells to heat.

[0033] In various embodiments, the identifying includes evaluating cellsurvival under a set of conditions.

[0034] Typically, one or more of the zinc finger domains of theartificial polypeptides varies among nucleic acids of the library. Thenucleic acid can also express at least a third DNA binding domain, e.g.,a third zinc finger domain.

[0035] The cells of the plurality can include nucleic acids encoding asufficient number of different artificial polypeptides to recognize atleast 10, 20 30, 40, or 50 different 3-base pair DNA sites. In oneembodiment, the cells of the plurality include nucleic acids encoding asufficient number of artificial polypeptides to recognize no more than30, 20, 10, or 5 different 3-base pair DNA sites.

[0036] The method can further include isolating the nucleic acidencoding the artificial polypeptide from the identified cell and/orisolating the artificial polypeptide from the identified cell. Thenucleic acid encoding the artificial polypeptide can be sequenced.

[0037] In one embodiment, the method further includes: isolating thenucleic acid encoding the artificial polypeptide from the cell,introducing the nucleic acid into a second plurality of cells, culturingthe cells of the second plurality under conditions wherein theartificial polypeptide is produced, identifying a cell of the secondplurality having a trait that is altered relative to a reference cell.

[0038] The sequence of the target DNA site of the artificial polypeptidecan be determined (e.g., by a computer string or profile search of asequence database, or by selecting the in vitro nucleic acids that bindto the artificial polypeptide (e.g., SELEX).

[0039] The method can further include analyzing the expression of one ormore genes of the cell, e.g., using .g., using mRNA profiling (e.g.,using microarray analysis), 2-D gel electrophoresis, an array of proteinligands (e.g., antibodies), and/or mass spectroscopy. Also, a single orsmall number of genes or proteins can also be profiled. In oneembodiment, the profile is compared to a database of reference profiles.In another embodiment, regulatory regions of genes whose expression isaltered by expression of the identified chimeric polypeptide arecompared to identify candidate sites that determine coordinateregulation that results directly or indirectly from expression of theartificial polypeptide.

[0040] An endogenous gene bound by the artificial polypeptide can becharacterized, e.g., identified by sequencing. Expression of theendogenous gene can be regulated in a second cell, e.g., by a meansother than ZFP-mediated regulation, e.g., by knocking out the gene, oroverexpressing the gene in the second cell.

[0041] The cells of the plurality can include nucleic acids encodingartificial polypeptides comprising naturally-occurring zinc fingerdomain(s), or variants thereof. The naturally-occurring zinc fingerdomains can be domains of any eukaryotic zinc finger protein: forexample, a fungal (e.g., yeast), plant, or animal protein (e.g., amammalian protein, such as a human or murine protein).

[0042] The cells of the plurality can include nucleic acids encodingartificial polypeptides comprising one, two three, or four zinc fingerdomains. In one embodiment, the artificial polypeptides include at leastthree zinc finger domains. The artificial polypeptides encoded by thenucleic acids can include other features described herein.

[0043] In one embodiment, the cells of the plurality are E. coli cells.

[0044] The method can further include cultivating the identified cell toexploit the altered trait. For example, if the altered trait isincreased production of a metabolite, the method can include cultivatingthe cell to produce the metabolite. The cell can be the cell isolatedfrom the plurality, or a cell into which the nucleic acid encoding theartificial polypeptide has been re-introduced. Expression of theartificial polypeptide can be tuned, e.g., using an inducible promoter,in order to finely vary the trait, or another conditional promoter(e.g., a cell type specific promoter). A cell containing the nucleicacid encoding the artificial polypeptide can be introduced into anorganism (e.g., ex vivo treatment).

[0045] Exemplary applications of these methods include: identifyingessential genes in (e.g., in a pathogenic microbe), identifying genesrequired for a particular phenotype, identifying targets of drugcandidates, gene discovery in signal transduction pathways, microbialengineering and industrial biotechnology, increasing yield ofmetabolites of commercial interests, and modulating growth behavior(e.g. improving growth of a microorganism).

[0046] In another aspect, the invention features a prokaryotic cellincluding: a nucleic acid encoding an artificial polypeptide, whereinthe artificial polypeptide comprises a zinc finger domain, and whereinthe artificial polypeptide binds to a target DNA site in a gene andregulates expression of the gene under conditions in which the nucleicacid is expressed. The cell can be an E. coli cell.

[0047] In one embodiment, the artificial polypeptide regulatesexpression of an endogenous gene. In one embodiment, the artificialpolypeptide regulates expression of a heterologous gene.

[0048] The artificial polypeptide can include one, two, three, four,five, six, or more zinc finger domains. In one embodiment, theartificial polypeptide comprises three zinc finger domains. In oneembodiment, the artificial polypeptide comprises four zinc fingerdomains.

[0049] The zinc finger domain(s) of the artificial polypeptide can benaturally-occurring zinc finger domains, or variants thereof. Thenaturally-occurring zinc finger domains can be domains from anyeukaryotic zinc finger protein: for example, a fungal (e.g., yeast),plant, or animal protein (e.g., a mammalian protein, such as a human ormurine protein).

[0050] The artificial polypeptides can include other features describedherein.

[0051] In another aspect, the invention features a cell selected by amethod, the method including: providing a plurality of prokaryoticcells, wherein each cell of the plurality comprises a nucleic acidencoding an artificial polypeptide, wherein the artificial polypeptidecomprises a zinc finger domain, and wherein the artificial polypeptidediffers among the cells of the plurality; and, identifying from theplurality a cell that has a trait that is altered relative to areference cell. The reference cell, e.g., is a cell that does notinclude a nucleic acid encoding an artificial polypeptide, e.g., thereference cell is a parental cell from which the plurality of cells wasmade, or a derivative thereof.

[0052] The trait can be any detectable phenotype, e.g., a phenotype thatcan be observed, selected, inferred, and/or quantitated. The artificialpolypeptide can be a chimeric polypeptide. An artificial polypeptide caninclude one or more features described herein.

[0053] In another aspect, the invention features a polypeptide includingat least one zinc finger domain, wherein the DNA contacting residues ofthe zinc finger domain at positions −1, +2, +3, and +6 correspond to amotif selected from: RSHR, HSSR, ISNR, RDHT, QTHR, VSTR, QNTQ, and CSNR,and wherein the polypeptide regulates an endogenous prokaryotic geneand/or alters the phenotype of a prokaryotic cell.

[0054] The polypeptide can further include a second and third zincfinger domain, wherein the DNA contacting residues of the first, second,and third domains at positions −1, +2, +3, and +6 of each domainrespectively correspond to the motifs RSHR, HSSR, and ISNR.

[0055] The polypeptide can further include a second and third zincfinger domain, wherein the DNA contacting residues of the first, second,and third domains at positions −1, +2, +3, and +6 of each domainrespectively correspond to the motifs ISNR, RDHT, and QTHR.

[0056] The polypeptide can further include a fourth zinc finger domain,wherein the DNA contacting residues of the fourth domain at positions−1, +2, +3, and +6 of correspond to the motif VSTR.

[0057] The polypeptide can further include a second and third zincfinger domain, wherein the DNA contacting residues of the first, second,and third domains at positions −1, +2, +3, and +6 of each domainrespectively correspond to the motifs QNTQ, CSNR, and ISNR.

[0058] In another aspect, the invention feature a polypeptide includingat least one zinc finger domain, wherein the DNA contacting residues ofthe zinc finger domain at positions −1, +2, +3, and +6 correspond to amotif selected from: QSHV, VSNV, QSNK, RDHT, QTHR, QSSR, WSNR, VSNV,RSHR, DSAR, QTHQ, RSHR, QSNR, and CSNR, and wherein the polypeptideregulates an endogenous prokaryotic gene and/or alters the phenotype ofa prokaryotic cell.

[0059] In one embodiment, the polypeptide further includes a second,third, and fourth zinc finger domain, wherein the DNA contactingresidues of the first, second, third, and fourth domains at positions−1, +2, +3, and +6 of each domain respectively correspond to the motifsQSHV, VSNV, QSNK, and QSNK.

[0060] In one embodiment, the polypeptide further includes a second,third, and fourth zinc finger domain, wherein the DNA contactingresidues of the first, second, third, and fourth domains at positions−1, +2, +3, and +6 of each domain respectively correspond to the motifsRDHT, QSHV, QTHR1, and QSSR.

[0061] In one embodiment, the polypeptide further includes a second,third, and fourth zinc finger domain, wherein the DNA contactingresidues of the first, second, third, and fourth domains at positions−1, +2, +3, and +6 of each domain respectively correspond to the motifsWSNR, QSHV, VSNV, and QSHV.

[0062] In one embodiment, the polypeptide further includes a second,third, and fourth zinc finger domain, wherein the DNA contactingresidues of the first, second, third, and fourth domains at positions−1, +2, +3, and +6 of each domain respectively correspond to the motifsQTHR, RSHR, QTHR, and QTHR.

[0063] In one embodiment, the polypeptide further includes a second,third, and fourth zinc finger domain, wherein the DNA contactingresidues of the first, second, third, and fourth domains at positions−1, +2, +3, and +6 of each domain respectively correspond to the motifsDSAR, RDHT, QSHV, and QTHR.

[0064] In one embodiment, the polypeptide further includes a second,third, and fourth zinc finger domain, wherein the DNA contactingresidues of the first, second, third, and fourth domains at positions−1, +2, +3, and +6 of each domain respectively correspond to the motifsQTHQ, RSHR, QTHR, and QTHR.

[0065] In one embodiment, the polypeptide further includes a second,third, and fourth zinc finger domain, wherein the DNA contactingresidues of the first, second, third, and fourth domains at positions−1, +2, +3, and +6 of each domain respectively correspond to the motifsQSHV, VSNV, QSNR, and CSNR.

[0066] In one embodiment, the polypeptide further includes a second,third, and fourth zinc finger domain, wherein the DNA contactingresidues of the first, second, third, and fourth domains at positions−1, +2, +3, and +6 of each domain respectively correspond to the motifsVSNV, QTHR, QSSR, and RDHT.

[0067] In one embodiment, the polypeptide further includes a second,third, and fourth zinc finger domain, wherein the DNA contactingresidues of the first, second, third, and fourth domains at positions−1, +2, +3, and +6 of each domain respectively correspond to the motifsRDHT, QSHV, QTHR, and QSNR.

[0068] In one embodiment, the polypeptide further includes a second,third, and fourth zinc finger domain, wherein the DNA contactingresidues of the first, second, third, and fourth domains at positions−1, +2, +3, and +6 of each domain respectively correspond to the motifsDSAR, RDHT, QSNK, and QTHR.

[0069] In another aspect, the invention features an isolated nucleicacid encoding an artificial polypeptide described herein.

[0070] In another aspect, the invention features a bacterial nucleicacid expression vector encoding an artificial polypeptide describedherein.

[0071] In another aspect, the invention features a method of producing apolypeptide, the method including: providing a prokaryotic cell, whereinthe cell expresses an artificial polypeptide comprising a zinc fingerdomain, and wherein the artificial polypeptide binds to a target DNAsite in a gene, culturing the cell under conditions that permitproduction of the polypeptide at a level higher or lower (e.g., at leasttwo, three, five, ten, or a hundred fold) than the level produced by anidentical cell that includes the gene but not the artificialpolypeptide, and detecting the polypeptide produced by the cell and/orpurifying the polypeptide from the cell and/or from the medium thatsurrounds the cell. The polypeptide can be an endogenous or heterologouspolypeptide. Production of the polypeptide by the cell can be directlyor indirectly regulated by the artificial polypeptide. The method canfurther include introducing the cell into a subject. The method canfurther include formulating the polypeptide with a pharmaceuticallyacceptable carrier.

[0072] In another aspect, the invention features a method of preparing amodified prokaryotic cell, the method including providing a nucleic acidlibrary that includes a plurality of nucleic acids, each encoding adifferent artificial polypeptide, each polypeptide including at leasttwo zinc finger domains; identifying a first and a second member of thelibrary which alters a given trait of a cell; and preparing a cell thatcan express first and second polypeptides, the first and secondpolypeptides being encoded respectively by the first and secondidentified library members. The method can also be extended toadditional member, e.g., a third member. The method can further includeevaluating the given trait for the prepared cell. The method can includeother features described herein.

[0073] In another aspect, the method includes a method of producing acellular product. The method includes providing a modified cell thatincludes a nucleic acid encoding an artificial polypeptide; maintainingthe modified cell under conditions in which the artificial polypeptideis produced; and recovering a product produced by the cultured cell,wherein the product is other than the artificial polypeptide. Forexample, the artificial polypeptide can confer stress resistance, oranother property described herein, e.g., altered protein production,altered metabolite production, and so forth. For example, the artificialpolypeptide includes at least two zinc finger domains. One or more ofthe zinc finger domains can be naturally occurring, e.g., a naturallyoccurring domain in Table 3. Exemplary artificial polypeptides includepolypeptides that have one or more consecutive motifs (e.g., at leasttwo, three or four consecutive motifs, or at least three motifs in thesame pattern, including non-consecutive patterns) as described herein.

[0074] Exemplary products include a metabolite or a protein (e.g., anendogenous or heterologous protein. For example, the modified cellfurther includes a second nucleic acid encoding a heterologous protein,and the heterologous protein participates in production of themetabolite. The modified cell can be maintained at a temperature between20° C. and 40° C. or greater than 37° C. In one embodiment, the modifiedcell is maintained under conditions which would inhibit the growth of asubstantially identical cell that lacks the artificial polypeptide.

[0075] In another aspect, the invention features an artificialpolypeptide that alters sensitivity of a cell expressing the artificialpolypeptide to a toxic agent (e.g., a catabolite of the cell or achemical) relative to an identical cell that does not express theartificial polypeptide. The sensitivity can be increased or decreased.Exemplary artificial polypeptides include polypeptides that have one ormore zinc finger domains, e.g., zinc finger domains including motifs asdescribed herein.

[0076] With respect to all methods described herein, a library ofnucleic acids that encode chimeric zinc finger proteins can be used. Theterm “library” refers to a physical collection of similar, butnon-identical biomolecules. The collection can be, for example, togetherin one vessel or physically separated (into groups or individually) inseparate vessels or on separate locations on a solid support. Duplicatesof individual members of the library may be present in the collection. Alibrary can include at least 10, 10², 10³, 10⁵, 10⁷, or 10⁹ differentmembers, or fewer than 10¹³, 10¹², 10¹⁰, 10⁹, 10⁷, 10⁵, or 10³ differentmembers.

[0077] A first exemplary library includes a plurality of nucleic acids,each nucleic acid encoding a polypeptide comprising at least a first,second, and third zinc finger domains. As used herein, “first, secondand third” denotes three separate domains that can occur in any order inthe polypeptide: e.g., each domain can occur N-terminal or C-terminal toeither or both of the others. The first zinc finger domain varies amongnucleic acids of the plurality. The second zinc finger domain variesamong nucleic acids of the plurality. At least 10 different first zincfinger domains are represented in the library. In one implementation, atleast 0.5, 1, 2, 5%, 10%, or 25% of the members of the library binds atleast one target site with a dissociation constant of no more than 7, 5,3, 2, 1, 0.5, or 0.05 nM. The first and second zinc finger domains canbe from different naturally-occurring proteins or are positioned in aconfiguration that differs from their relative positions in anaturally-occurring protein. For example, the first and second zincfinger domains may be adjacent in the polypeptide, but may be separatedby one or more intervening zinc finger domains in a naturally occurringprotein.

[0078] A second exemplary library includes a plurality of nucleic acids,each nucleic acid encoding a polypeptide that includes at least firstand second zinc finger domains. The first and second zinc finger domainsof each polypeptide (1) are identical to zinc finger domains ofdifferent naturally occurring proteins (and generally do not occur inthe same naturally occurring protein or are positioned in aconfiguration that differs from their relative positions in anaturally-occurring protein), (2) differ by no more than four, three,two, or one amino acid residues from domains of naturally occurringproteins, or (3) are non-adjacent zinc finger domains from a naturallyoccurring protein. Identical zinc finger domains refer to zinc fingerdomains that are identical at each amino acid from the first metalcoordinating residue (typically cysteine) to the last metal coordinatingresidue (typically histidine). The first zinc finger domain varies amongnucleic acids of the plurality, and the second zinc finger domain variesamong nucleic acids of the plurality. The naturally occurring proteincan be any eukaryotic zinc finger protein: for example, a fungal (e.g.,yeast), plant, or animal protein (e.g., a mammalian protein, such as ahuman or murine protein). Each polypeptide can further include a third,fourth, fifth, and/or sixth zinc finger domain. Each zinc finger domaincan be a mammalian, e.g., human, zinc finger domain.

[0079] Other types of libraries can also be used, e.g., includingmutated zinc finger domains.

[0080] In some embodiments, a library of nucleic acids encoding zincfinger proteins or a library of such proteins themselves can includemembers with different regulatory domains. For example, the library caninclude at least 10% of members with an activation domain, and at leastanother 10% of members with a repression domain. In another example, atleast 10% have an activation domain or repression domain; another atleast 10% has no regulatory domain. In still another example, someinclude an activation domain; others, a repression domain; still others,no regulatory domain at all. Other percentages, e.g., at least 20, 25,30, 40, 50, 60% can also be used.

[0081] The term “gene” refers to coding and noncoding DNA sequenceassociated with the expression of a particular polypeptide. A geneincludes, e.g., exonic sequences, intronic sequences, promoter,enhancer, and other regulatory sequences.

[0082] As used herein, the “dissociation constant” refers to theequilibrium dissociation constant of a polypeptide for binding to a28-basepair double-stranded DNA that includes one 9-basepair targetsite. The dissociation constant is determined by gel shift analysisusing a purified protein that is bound in 20 mM Tris pH 7.7, 120 mMNaCl, 5 mM MgCl₂, 20 μM ZnSO₄, 10% glycerol, 0.1% Nonidet P-40, 5 mMDTT, and 0.10 mg/mL BSA (bovine serum albumin) at room temperature.Additional details are provided in Example 10 and Rebar and Pabo (1994)Science 263:671-673.

[0083] As used herein, the term “screen” refers to a process forevaluating members of a library to find one or more particular membersthat have a given property. In a direct screen, each member of thelibrary is evaluated. For example, each cell is evaluated to determineif it is extending neurites. In another type of screen, termed a“selection,” each member is not directly evaluated. Rather theevaluation is made by subjecting the members of the library toconditions in which only members having a particular property areretained. Selections may be mediated by survival (e.g., drug resistance)or binding to a surface (e.g., adhesion to a substrate). Such selectiveprocesses are encompassed by the term “screening.”

[0084] The term “base contacting positions,” “DNA contacting positions,”or “nucleic acid contacting positions” refers to the four amino acidpositions of a zinc finger domain that structurally correspond to thepositions of amino acids arginine 73, aspartic acid 75, glutamic acid76, and arginine 79 of ZIF268. Glu Arg Pro Tyr Ala Cys Pro Val Glu SerCys Asp Arg Arg Phe Ser (SEQ ID NO:65) 1               5                   10                  15 Arg Ser AspGlu Leu Thr Arg His Ile Arg Ile His Thr Gly Gln Lys             20                  25                  30 Pro Phe Gln CysArg Ile Cys Met Arg Asn Phe Ser Arg Ser Asp His         35                  40                  45 Leu Thr Thr His IleArg Thr His Thr Gly Glu Lys Pro Phe Ala Cys     50                  55                  60 Asp Ile Cys Gly Arg LysPhe Ala Arg Ser Asp Glu Arg Lys Arg His 65                  70                  75                  80 Thr LysIle His Leu Arg Gln Lys Asp                  85

[0085] These positions are also referred to as positions −1, 2, 3, and6, respectively. To identify positions in a query sequence thatcorrespond to the base contacting positions, the query sequence isaligned to the zinc finger domain of interest such that the cysteine andhistidine residues of the query sequence are aligned with those offinger 3 of Zif268. The ClustalW WWW Service at the EuropeanBioinformatics Institute (Thompson et al. (1994) Nucleic Acids Res.22:4673-4680) provides one convenient method of aligning sequences.

[0086] Conservative amino acid substitutions refer to theinterchangeability of residues having similar side chains. For example,a group of amino acids having aliphatic side chains is glycine, alanine,valine, leucine, and isoleucine; a group of amino acids havingaliphatic-hydroxyl side chains is serine and threonine; a group of aminoacids having amide-containing side chains is asparagine and glutamine; agroup of amino acids having aromatic side chains is phenylalanine,tyrosine, and tryptophan; a group of amino acids having basic sidechains is lysine, arginine, and histidine; a group of amino acids havingacidic side chains is aspartic acid and glutamic acid; and a group ofamino acids having sulfur-containing side chains is cysteine andmethionine. Depending on circumstances, amino acids within the samegroup may be interchangeable. Some additional conservative amino acidssubstitution groups are: valine-leucine-isoleucine;phenylalanine-tyrosine; lysine-arginine; alanine-valine; asparticacid-glutamic acid; and asparagine-glutamine.

[0087] The term “heterologous polypeptide” or “artificial polypeptide”refers either to a polypeptide with a non-naturally occurring sequence(e.g., a hybrid polypeptide) or a polypeptide with a sequence identicalto a naturally occurring polypeptide but present in a milieu in which itdoes not naturally occur. For example, the fusion of two naturallyoccurring polypeptides that are not fused together in Nature results inan artificial polypeptide in which one polypeptide is heterologous tothe other.

[0088] The terms “hybrid” and “chimera” refer to a non-naturallyoccurring polypeptide that comprises amino acid sequences derived fromeither (i) at least two different naturally occurring sequences, ornon-contiguous regions of the same naturally occurring sequence, whereinthe non-contiguous regions are made contiguous in the hybrid; (ii) atleast one artificial sequence (i.e., a sequence that does not occurnaturally) and at least one naturally occurring sequence; or (iii) atleast two artificial sequences (same or different). Examples ofartificial sequences include mutants of a naturally occurring sequenceand de novo designed sequences. An “artificial sequence” is not presentamong naturally occurring sequences. With respect to any artificialsequence (e.g., protein or nucleic acid) described herein, the inventionalso refers to a sequence with the same elements, but which is notpresent in each of the following organisms whose genomes are sequenced:Homo sapiens, Mus musculus, Arabidopsis thaliana, Drosophilamelanogaster, Escherichia coli, Saccharomyces cerevisiae, and Oryzasativa. A molecule with such a sequence can be expressed as aheterologous molecule in a cell of one of the afore-mentioned organisms.

[0089] The invention also includes sequences (not necessarily termed“artificial”) which are made by a method described herein, e.g., amethod of joining nucleic acid sequences encoding different zinc fingerdomains or a method of phenotypic screening. The invention also featuresa cell that includes such a sequence.

[0090] As used herein, the term “hybridizes under stringent conditions”refers to conditions for hybridization in 6× sodium chloride/sodiumcitrate (SSC) at 45° C., followed by two washes in 0.2× SSC, 0.1% SDS at65° C.

[0091] The term “binding preference” refers to the discriminativeproperty of a polypeptide for selecting one nucleic acid binding siterelative to another. For example, when the polypeptide is limiting inquantity relative to two different nucleic acid binding sites, a greateramount of the polypeptide will bind the preferred site relative to theother site in an in vivo or in vitro assay described herein.

[0092] A “reference cell” refers to any cell of interest. In oneexample, the reference cell is a parental cell for a cell that expressesa zinc finger protein, e.g., a cell that is substantially identical tothe zinc finger protein expressing cell, but which does not produce thezinc finger protein.

[0093] A “transformed” or “transfected” cell refers to a cell thatincludes a heterologous nucleic acid. The cell can be made byintroducing (e.g., transforming, transfecting, or infecting, e.g., usinga viral particle) a nucleic acid into the cell or the cell can be aprogeny or derivative of a cell thus made.

[0094] Among other advantages, many of the methods and compositionsrelate to the identification and use of new and useful zinc fingerproteins for regulating gene expression in prokaryotic cells. Endogenousgenes can be either up- or down-regulated using modular zinc fingerproteins. Even without a transcriptional regulatory domain (e.g., arepression or activation domain), zinc finger proteins can be potentmodulators of gene expression. It is possible to screen a plurality ofcells expressing zinc finger proteins with different DNA bindingspecificities, in order to identify cells having altered traits due toaltered gene expression. Moreover, gene expression in prokaryotes can befinely regulated, by regulating expression of the zinc finger proteins.Depending on the DNA-binding affinity, chimeric polypeptides can cause arange of effects, e.g., moderate to strong activation and repression.This may lead to diverse phenotypes that are not necessarily obtained bycompletely inactivation or high level over-expressed of a particulartarget gene.

[0095] Methods described herein do not require a priori information(e.g., genome sequence) of the cell in order to identify useful chimericproteins. Artificial chimeric proteins can be used as a tool to dissectpathways within a cell. For example, target genes responsible for thephenotypic changes in selected clones can be identified, e.g., asdescribed herein. A zinc finger protein may mimic the function of amaster regulatory protein, such as a master regulatory transcriptionfactor. For example, the zinc finger protein may bind to the same siteas the master regulatory, or to an overlapping site. The level of geneexpression change, thus the extent of the phenotype generated by ZFP-TF,can be precisely controlled by altering the expression level of zincfinger protein in cells.

[0096] All patents, patent applications, and references cited herein areincorporated by reference in their entirety. The following patentapplications: WO 01/60970 (Kim et al.); U.S. Ser. No. 60/338,441, filedDec. 7, 2001; U.S. Ser. No. 60/313,402, filed Aug. 17, 2001; U.S. Ser.No. 60/374,355, filed Apr. 22, 2002; U.S. Ser. No. 60/376,053, filedApr. 26, 2002; U.S. Ser. No. 60/400,904, filed Aug. 2, 2002; U.S. Ser.No. 60/401,089, filed Aug. 5, 2002; and U.S. Ser. No. 10/223,765, filedAug. 19, 2002, are expressly incorporated by reference in their entiretyfor all purposes. The details of one or more embodiments of theinvention are set forth in the accompanying drawings and the descriptionbelow. Any feature described herein can be used in combination withanother compatible feature also described herein. Other features,objects, and advantages of the invention will be apparent from thedescription and drawings, and from the claims.

DESCRIPTION OF THE DRAWINGS

[0097]FIGS. 1A, 1B, and 1C are a set of pictures depicting phenotypicchanges in E. coli induced by expression of artificial zinc fingerproteins. FIG. 1A depicts growth of cells on LB plates in the presenceor absence of 1.5% hexane. Clones H1, H2, and H3 expressed zinc fingerproteins. Control cells (C; E. coli cells transformed with pZL1) did notexpress zinc finger proteins. FIG. 1B depicts growth of heat-shocked,and untreated cells on LB plates. Selected clones (T1 to T10) expressedzinc finger proteins. Control cells (C; E. coli cells transformed withpZL1) did not express zinc finger proteins. FIG. 1C depicts growth ofcontrol cells (C; E. coli cells transformed with pZL1), cells expressingthe T9 zinc finger protein (T9), and cells expressing a mutated versionof T9 (T9-M) on LB plates. An arginine residue in the QTHR1 zinc fingerdomain of the T9 protein was mutated to alanine to produce T9-M. Cellswere heat-shocked or untreated. In FIG. 1A. and FIG. 1B, the trianglesdrawn above of each panel indicate 10-fold serial dilutions (1:1 to1:10,000, left to right) of spotted cells.

[0098]FIGS. 2A, 2B, and 2C. Identification of a Target Gene Regulated byZinc Finger proteins

[0099]FIG. 2A (left panel) depicts growth of control cells (C; E. colicells transformed with pZL1), cells transformed with zinc finger proteinT9, and cells containing a disruption in the UbiX gene (ubiX) on LBplates. Cells were heat-shocked or untreated. The triangles drawn aboveof each panel indicate 10-fold serial dilutions (1:1 to 1:10,000, leftto right) of spotted cells. FIG. 2A (right panel) is a graph depictingthe percent survival of heat-shocked control cells (C; E. coli cellstransformed with pZL1), T9-transformed cells, and cells containing adisruption in the ubiX gene (ubiX). FIG. 2B is a graph depicting therelative level of UbiX transcripts in control and T9-expressing cells.FIG. 2C is a schematic diagram depicting the interaction T9-ZFP withpotential binding sites located in the UbiX promoter. The position ofpotential binding sites relative to the transcription start site isindicated. Binding of T9-ZFP to the position was confirmed byimmuno-precipitation.

[0100] Like reference symbols in the various drawings indicate likeelements.

DETAILED DESCRIPTION

[0101] The invention is based, in part, on the discovery that zincfinger proteins can regulate gene expression in prokaryotic organisms.Zinc finger proteins (e.g., zinc finger proteins that include eukaryoticzinc finger domains) can modulate expression of endogenous genes inprokaryotes.

[0102] Expression of libraries of zinc finger proteins in prokaryoticcells can allow the identification of zinc finger proteins that alter aphenotype of the cells. Furthermore, expression of these proteinsenables the identification of gene products (e.g.,endogenously-expressed gene products), the modulation of which alters aphenotype of the cells.

[0103] In one embodiment, a nucleic acid library that encodes artificialpolypeptides which include random chimeras of zinc finger domains istransformed into prokaryotic cells (e.g., E. coli cells). Nucleic acidsof the library are expressed in the cells. The cells are evaluated for aphenotype of interest, and cells in which the phenotype is alteredrelative to a control are isolated. The library nucleic acids in suchcells are recovered, and the zinc finger protein encoded by suchrecovered nucleic acids can be further characterized, utilized, ormodified. The target DNA site bound by the zinc finger protein can alsobe recovered and characterized. In one embodiment, the genes thatinclude the target DNA sites are identified, thereby revealing genesinvolved in modulation of the phenotype of interest.

[0104] Chimeric zinc finger proteins that include, one, two, three,four, or more zinc finger domains can be used to regulate geneexpression in prokaryotic cells. These zinc finger proteins can includetwo or more naturally-occurring zinc finger proteins.

[0105] Zinc finger proteins may also be engineered to recognize a targetDNA site in a prokaryotic cell. Useful target sites include sites in aregulatory region of the target gene or within 1 kb or 500 bp of aregulatory region of a target gene. For example, the target site can bewithin 1 kb or 500 bp of a transcriptional start site of a gene. Onemethod for designing a zinc finger protein includes parsing target sitesinto 3 or 4 basepair sequences that can be recognized by an individualzinc finger domain. Then a nucleic acid is constructed which includes asequence that encodes a protein that has consecutive zinc finger domainscorresponding to the parsed elements. A plurality of different nucleicacids that encode candidate proteins is constructed and expressed in ahost cell. The expression of the target gene is evaluated to identifyone or more of the candidates that is able to regulate expression of thetarget gene.

[0106] In one aspect of the invention, a library of nucleic acids thatencode different artificial, chimeric polypeptides is screened toidentify a chimeric protein that alters a phenotypic trait of aprokaryotic cell. The artificial polypeptide can be identified without apriori knowledge of a particular target gene or pathway.

[0107] Library Construction

[0108] The nucleic acid library is constructed so that it includesnucleic acids that each encode and can express an artificial polypeptidethat is a chimera of one or more structural domains (e.g., zinc fingerdomains). The zinc finger domains are nucleic acid binding domains thatcan vary in specificity such that the library encodes a population ofproteins with different binding specificities.

[0109] Zinc fingers. Zinc fingers are small polypeptide domains ofapproximately 30 amino acid residues in which there are four aminoacids, either cysteine or histidine, appropriately spaced such that theycan coordinate a zinc ion (For reviews, see, e.g., Klug and Rhodes,(1987) Trends Biochem. Sci. 12:464-469(1987); Evans and Hollenberg,(1988) Cell 52:1-3; Payre and Vincent, (1988) FEBS Lett. 234:245-250;Miller et al., (1985) EMBO J. 4:1609-1614; Berg, (1988) Proc. Natl.Acad. Sci. U.S.A. 85:99-102; Rosenfeld and Margalit, (1993) J. Biomol.Struct. Dyn. 11:557-570). Hence, zinc finger domains can be categorizedaccording to the identity of the residues that coordinate the zinc ion,e.g., as the Cys₂-His₂ class, the Cys₂-Cys₂ class, the Cys₂-CysHisclass, and so forth. The zinc coordinating residues of Cys₂-His₂ zincfingers are typically spaced as follows:X_(a)-X-C-X₂₋₅-C-X₃-X_(a)-X₅-ψ-X₂-H-X₃₋₅-H (SEQ ID NO:66), where ψ (psi)is a hydrophobic residue (Wolfe et al., (1999) Annu. Rev. Biophys.Biomol. Struct. 3:183-212), wherein “X” represents any amino acid,wherein X_(a) is phenylalanine or tyrosine, the subscript indicates thenumber of amino acids, and a subscript with two hyphenated numbersindicates a typical range of intervening amino acids. Typically, theintervening amino acids fold to form an anti-parallel β-sheet that packsagainst an α-helix, although the anti-parallel β-sheets can be short,non-ideal, or non-existent. The fold positions the zinc-coordinatingside chains so they are in a tetrahedral conformation appropriate forcoordinating the zinc ion. The base contacting residues are at theN-terminus of the finger and in the preceding loop region.

[0110] For convenience, the primary DNA contacting residues of a zincfinger domain are numbered: −1, 2, 3, and 6 based on the followingexample:

[0111] −1 1 2 3 4 5 6

[0112] X_(a)-X-C-X₂₋₅-C-X₃-X_(a)-X-C-X-S-N-X_(b)-X-R-H-X₃₋₅-H (SEQ IDNO: 123),

[0113] where X_(a) is typically phenylalanine or tyrosine, and X_(b) istypically a hydrophobic residue. As noted in the example above, the DNAcontacting residues are Cys (C), Ser (S), Asn (N), and Arg (R). Theabove motif can be abbreviated CSNR As used herein, such abbreviationrefers to a class of sequences which include a domain corresponding tothe motif as wells as a species whose sequence includes a particularpolypeptide sequence, typically a sequence listed in Table 1 or Table 3that conforms to the motif. Where two sequences in Table 1 Table 3 havethe same motif, a number may be used to indicate the sequence.

[0114] A zinc finger protein typically consists of a tandem array ofthree or more zinc finger domains. For example, zinc finger domainswhose motifs are listed consecutively are not interspersed with otherfolded domains, but may include a linker, e.g., a flexible linkerdescribed herein between domains. For an implementation that includes aspecific zinc finger protein or array thereof described herein, theinvention also features a related implementation that includes acorresponding zinc finger protein or array thereof having an array withzinc fingers that have the same DNA contacting residues as the specificzinc finger protein or array thereof. The corresponding zinc fingerprotein may differ by at least one, two, three, four, or five aminoacids from the disclosed specific zinc finger protein, e.g., at an aminoacid position that is not a DNA contacting residue. Other relatedimplementations include a corresponding protein that has at least one,two, or three zinc fingers that have the same DNA contacting residues,e.g., in the same order.

[0115] The zinc finger domain (or “ZFD”) is one of the most commoneukaryotic DNA-binding motifs, found in species from yeast to higherplants and to humans. By one estimate, there are at least severalthousand zinc finger domains in the human genome alone, possibly atleast 4,500. Zinc finger domains can be isolated from zinc fingerproteins. Non-limiting examples of zinc finger proteins include CF2-II,Kruppel, WT1, basonuclin, BCL-6/LAZ-3, erythroid Kruppel-liketranscription factor, Sp1, Sp2, Sp3, Sp4, transcriptional repressor YY1,EGR1/Krox24, EGR2/Krox20, EGR3/Pilot, EGR4/AT133, Evi-1, GLI1, GLI2,GLI3, HIV-EP1/ZNF40, HIV-EP2, KR1, ZfX, ZfY, and ZNF7.

[0116] Computational methods described below can be used to identify allzinc finger domains encoded in a sequenced genome or in a nucleic aciddatabase. Any such zinc finger domain can be utilized. In addition,artificial zinc finger domains have been designed, e.g., usingcomputational methods (e.g., Dahiyat and Mayo, (1997) Science 278:82-7).

[0117] It is also noteworthy that at least some zinc finger domains bindto ligands other than DNA, e.g., RNA or protein. Thus, a chimera of zincfinger domains or of a zinc finger domain and another type of domain canbe used to recognize a variety of target compounds, not just DNA.

[0118] WO 01/60970, U.S. Ser. No. 60/374,355, filed Apr. 22, 2002, andU.S. Ser. No. 10/223,765, filed Aug. 19, 2002, describe exemplary zincfinger domains which can be used to construct an artificial zinc fingerprotein. See also the Table 3, below.

[0119] A variety of other structural domains are known to bind nucleicacids with high affinity and high specificity. For reviews of structuralmotifs which recognize double stranded DNA, see, e.g., Pabo and Sauer(1992) Annu. Rev. Biochem. 61:1053-95; Patikoglou and Burley (1997)Annu. Rev. Biophys. Biomol. Struct. 26:289-325; Nelson (1995) Curr OpinGenet Dev. 5:180-9.

[0120] Identification of zinc finger domains. A variety of methods canbe used to identify zinc finger domains. Nucleic acids encodingidentified domains are used to construct the nucleic acid library.Further, nucleic acid encoding these domains can also be varied (e.g.,mutated) to provide additional domains that are encoded by the library.

[0121] Computational Methods. To identify additional naturally-occurringstructural domains (e.g., zinc finger domains), the amino acid sequenceof a known zinc finger domain can be compared to a database of knownsequences, e.g., an annotated database of protein or nucleic acidsequences. In another implementation, databases of uncharacterizedsequences, e.g., unannotated genomic, EST or full-length cDNA sequence;of characterized sequences, e.g., SwissProt or PDB; and of domains,e.g., Pfam, ProDom (Corpet et al. (2000) Nucleic Acids Res. 28:267-269),and SMART (Simple Modular Architecture Research Tool, Letunic et al.(2002) Nucleic Acids Res 30, 242-244) can provide a source of zincfinger domain sequences. Nucleic acid sequence databases can betranslated in all six reading frames for the purpose of comparison to aquery amino acid sequence. Nucleic acid sequences that are flagged asencoding candidate nucleic acid binding domains can be amplified from anappropriate nucleic acid source, e.g., genomic DNA or cellular RNA. Suchnucleic acid sequences can be cloned into an expression vector. Theprocedures for computer-based domain identification can be interfacedwith an oligonucleotide synthesizer and robotic systems to producenucleic acids encoding the domains in a high-throughput platform. Clonednucleic acids encoding the candidate domains can also be stored in ahost expression vector and shuttled easily into an expression vector,e.g., into a translational fusion vector with other domains (of asimilar or different type), either by restriction enzyme mediatedsubcloning or by site-specific, recombinase mediated subcloning (seeU.S. Pat. No. 5,888,732). The high-throughput platform can be used togenerate multiple microtitre plates containing nucleic acids encodingdifferent candidate chimeras.

[0122] Detailed methods for the identification of domains from astarting sequence or a profile are well known in the art. See, forexample, Prosite (Hofmann et al., (1999) Nucleic Acids Res. 27:215-219),FASTA, BLAST (Altschul et al., (1990) J. Mol. Biol. 215:403-10.), etc. Asimple string search can be done to find amino acid sequences withidentity to a query sequence or a query profile, e.g., using Perl toscan text files. Sequences so identified can be about 30%, 40%, 50%,60%, 70%, 80%, 90%, or greater identical to an initial input sequence.

[0123] Domains similar to a query domain can be identified from a publicdatabase, e.g., using the XBLAST programs (version 2.0) of Altschul etal., (1990) J. Mol. Biol. 215:403-10. For example, BLAST proteinsearches can be performed with the XBLAST parameters as follows:score=50, wordlength=3. Gaps can be introduced into the query orsearched sequence as described in Altschul et al., (1997) Nucleic AcidsRes. 25(17): 3389-3402. Default parameters for XBLAST and Gapped BLASTprograms are available at National Center for Biotechnology Information(NCBI), National Institutes of Health, Bethesda Md.

[0124] The Prosite profiles PS00028 and PS50157 can be used to identifyzinc finger domains. In a SWISSPROT release of 80,000 protein sequences,these profiles detected 3189 and 2316 zinc finger domains, respectively.Profiles can be constructed from a multiple sequence alignment ofrelated proteins by a variety of different techniques. Gribskov andco-workers (Gribskov et al., (1990) Meth. Enzymol. 183:146-159) utilizeda symbol comparison table to convert a multiple sequence alignmentsupplied with residue frequency distributions into weights for eachposition. See, for example, the PROSITE database and the work of Luethyet al., (1994) Protein Sci. 3:139-1465.

[0125] Hidden Markov Models (HMM's) representing a DNA binding domain ofinterest can be generated or obtained from a database of such models,e.g., the Pfam database, release 2.1. A database can be searched, e.g.,using the default parameters, with the HMM in order to find additionaldomains (see, e.g., Bateman et al. (2002) Nucleic Acids Research30:276-280). Alternatively, the user can optimize the parameters. Athreshold score can be selected to filter the database of sequences suchthat sequences that score above the threshold are displayed as candidatedomains. A description of the Pfam database can be found in Sonhammer etal., (1997) Proteins 28(3): 405-420, and a detailed description of HMMscan be found, for example, in Gribskov et al., (1990) Meth. Enzymol.183:146-159; Gribskov et al., (1987) Proc. Natl. Acad. Sci. USA84:4355-4358; Krogh et al., (1994) J. Mol. Biol. 235:1501-1531; andStultz et al., (1993) Protein Sci. 2:305-314.

[0126] The SMART database of HMM's (Simple Modular Architecture ResearchTool, Schultz et al., (1998) Proc. Natl. Acad. Sci. USA 95:5857 andSchultz et al., (2000) Nucl. Acids Res 28:231) provides a catalog ofzinc finger domains (ZnF_C₂H₂; ZnF_C2C2; ZnF_C2HC; ZnF_C3H1; ZnF_C4;ZnF_CHCC; ZnF_GATA; and ZNF_NFX) identified by profiling with the hiddenMarkov models of the HMMer2 search program (Durbin et al., (1998)Biological sequence analysis: probabilistic models of proteins andnucleic acids. Cambridge University Press).

[0127] Hybridization-based Methods. A collection of nucleic acidsencoding various forms of a zinc finger domain can be analyzed toprofile sequences encoding conserved amino- and carboxy-terminalboundary sequences. Degenerate oligonucleotides can be designed tohybridize to sequences encoding such conserved boundary sequences.Moreover, the efficacy of such degenerate oligonucleotides can beestimated by comparing their composition to the frequency of possibleannealing sites in known genomic sequences. If desired, multiple roundsof design can be used to optimize the degenerate oligonucleotides.

[0128] Comparison of known Cys₂-His₂ zinc fingers, for example, revealeda common sequence in the linker region between adjacent fingers innatural sequence (Agata et al., (1998) Gene 213:55-64). Degenerateoligonucleotides that anneal to nucleic acid encoding the conservedlinker region were used to amplify a plurality of zinc finger domains.The amplified nucleic acid encoding the domains can be used to constructnucleic acids that encode a chimeric array of zinc fingers.

[0129] Nucleic Acids Encoding Zinc Finger Domains

[0130] Nucleic acids that are used to assemble the library can beobtained by a variety of methods. Some component nucleic acids of thelibrary can encode naturally occurring zinc finger domains. In addition,some component nucleic acids are variants that are obtained by mutationor other randomization methods. The component nucleic acids, typicallyencoding just a single domain, can be joined to each other to producenucleic acids encoding a fusion of the different zinc finger domains.

[0131] Isolation of a natural repertoire of domains. A library ofdomains can be constructed by isolation of nucleic acid sequencesencoding domains from genomic DNA or cDNA of eukaryotic organisms suchas yeasts or humans. Multiple methods are available for doing this. Forexample, a computer search of available amino acid sequences can be usedto identify the domains, as described above. A nucleic acid encodingeach domain can be isolated and inserted into a vector appropriate forthe expression in cells, e.g., a vector containing a promoter, anactivation domain, and a selectable marker. In another example,degenerate oligonucleotides that hybridize to a conserved motif are usedto amplify, e.g., by PCR, a large number of related domains containingthe motif. For example, Kruppel-like Cys₂His₂ zinc fingers can beamplified by the method of Agata et al., (1998) Gene 213:55-64. Thismethod also maintains the naturally occurring zinc finger domain linkerpeptide sequences, e.g., sequences with the pattern:Thr-Gly-(Glu/Gln)-(Lys/Arg)-Pro-(Tyr/Phe) (SEQ ID NO: 122). Moreover,screening a collection limited to domains of interest, unlike screeninga library of unselected genomic or cDNA sequences, significantlydecreases library complexity and reduces the likelihood of missing adesirable sequence due to the inherent difficulty of completelyscreening large libraries.

[0132] The human genome contains numerous zinc finger domains, many ofwhich are uncharacterized and unidentified. It is estimated that thereare thousands of genes encoding proteins with zinc finger domains(Pellegrino and Berg, (1991) Proc. Natl. Acad. Sci. USA 88:671-675).These human zinc finger domains represent an extensive collection ofdiverse domains from which novel DNA-binding proteins can beconstructed. Many exemplary human zinc finger domains are described inWO 01/60970, U.S. Ser. No. 60/374,355, filed Apr. 22, 2002, and U.S.Ser. No. 10/223,765, filed Aug. 19, 2002. See also Table 3 below. TABLE3 Exemplary Zinc Finger Domains SEQ ID Target ZFD Amino acid sequenceNO: subsite(s) CSNR1 YKCKQCGKAFGCPSNLRRHGRTH 67 GAA > GAC > GAG CSNR2YQCNICGKCFSCNSNLHRHQRTH 68 GAA > GAC > GAG DSAR2 YSCGICGKSFSDSSAKRRHCILH69 GTC DSCR YTCSDCGKAFRDKSCLNRHRRTH 70 GCC HSNK YKCKECGKAFNHSSNFNKHHRIH71 GAC HSSR FKCPVCGKAFRHSSSLVRHQRTH 72 GTT ISNR YRCKYCDRSFSISSNLQRHVRNIH73 GAA > GAT > GAC ISNV YECDHCGKAFSIGSNLNVHRRIH 74 AAT KSNRYGCHLCGKAFSKSSNLRRHEMIH 75 GAG QAHR YKCKECGQAFRQRAHLIRHHKLH 76 GGA QFNRYKCHQCGKAFIQSFNLRRHERTH 77 GAG QGNR FQCNQCGASFTQKGNLLRHIKLH 78 GAA QSHR1YACHLCGKAFTQSSHLRRHEKTH 79 GGA > GAA > AGA QSHR2 YKCGQCGKFYSQVSHLTRHQKIH80 GGA QSHR3 YACHLCGKAFTQCSHLRRHEKTH 81 GGA > GAA QSHR4YACHLCAKAFIQCSHLRRHEKTH 82 GGA > GAA QSHR5 YVCRECGRGFRQHSHLVRHKRTH 83GGA > AGA > GAA > CGA QSHT YKCEECGKAFRQSSHLTTHKIIH 84 AGA, CGA > TGA >GGA QSHV YECDHCGKSFSQSSHLNVHKRTH 85 CGA > AGA > TGA QSNIYMCSECGRGFSQKSNLIIHQRTH 86 AAA, CAA QSNK YKCEECGKAFTQSSNLTKHKKIH 87GAA > TAA > AAA QSNR1 FECKDCGKAFIQKSNLIRHQRTH 88 GAA QSNR2YVCRECRRGFSQKSNLIRHQRTH 89 GAA QSNR3 YECEKCGKAFNQSSNLTRHKKSH 90 GAAQSNV1 YECNTCRKTFSQKSNLIVHQRTH 91 AAA > CAA QSNV2 YVCSKCGKAFTQSSNLTVHQKIH92 AAA > CAA QSNV3 YKCDECGKNFTQSSNLIVHKRIH 93 AAA QSNV4YECDVCGKTFTQKSNLGVHQRTH 94 AAA QSNT YECVQCGKGFTQSSNLITHQRVH 95 AAA QSSR1YKCPDCGKSFSQSSSLIRHQRTH 96 GTA > GCA QSSR2 YECQDCGRAFNQNSSLGRHKRTH 97GTA QSSR3 YECNECGKFFSQSSSLIRHRRSH 98 GTA > GCA QSTRYKCEECGKAFNQSSTLTRHKIVH 99 GTA > GCA QSTV YECNECGKAFAQNSTLRVHQRIH 100ACA QTHQ YECHDCGKSFRQSTHLTQHRRIH 101 AGA > CGA, TGA QTHR1YECHDCGKSFRQSTHLTRHRRIH 102 GGA > AGA, GAA QTHR2 HKCLECGKCFSQNTHLTRHQRT103 GGA RDER1 YVCDVEGCTWKFARSDELNRHKKRH 104 GCG > GTG, GAC RDER2YHCDWDGCGWKFARSDELTRHYRKH 105 GCG > GTG RDER3 YRCSWEGCEWRFARSDELTRHFRKH106 GCG > GTG RDER4 FSCSWKGCERRFARSDELSRHRRTH 107 GCG > GTG RDER5FACSWQDCNKKFARSDELARHYRTH 108 GCG RDER6 YHCNWDGCGWKFARSDELTRHYRKH 109GCG > GTG RDHR1 FLCQYCAQRFGRKDHLTRHMKKSH 110 GAG, GGG RDHTFQCKTCQRKFSRSDHLKTHTRTH 111 AGG, CGG, GGG, TGG RDKIFACEVCGVRFTRNDKLKIHMRKH 112 GGG RDKR YVCDVEGCTWKFARSDKLNRHKKRH 113 GGG >AGG RSHR YKCMECGKAFNRRSHLTRHQRIH 114 GGG RSNR YICRKCGRGFSRKSNLIRHQRTH115 GAG > GTG RTNR YLCSECDKCFSRSTNLIRHRRTH 116 GAG SSNRYECKECGKAFSSGSNFTRHQRIH 117 GAG > GAC VSNV YECDHCGKAFSVSSNLNVHRRIH 118AAT > CAT > TAT VSSR YTCKQCGKAFSVSSSLRRHETTH 119 GTT > GTG > GTA VSTRYECNYCGKTFSVSSTLIRHQRIH 120 GCT > GCG WSNR YRCEECGKAFRWPSNLTRHKRIH 121GGT > GGA

[0133] If each zinc finger domain recognizes a unique 3- to 4-bpsequence, the total number of domains required to bind every possible 3-to 4-bp sequence is only 64 to 256 (4³ to 4⁴). It is possible that thenatural repertoire of the human genome contains a sufficient number ofunique zinc finger domains to span all possible recognition sites. Thesezinc finger domains are a valuable resource for constructing artificialchimeric DNA-binding proteins. A nucleic acid library can includenucleic acids encoding proteins that include naturally occurring zincfinger domains, artificial mutants of such domains, and combinationsthereof.

[0134] Mutated Domains. In one implementation, the library includesnucleic acids encoding at least one structural domain that is anartificial variant of a naturally-occurring sequence. In one embodiment,such variant domains are assembled from a degenerate patterned library.In the case of a nucleic acid binding domains, positions in closeproximity to the nucleic acid binding interface or adjacent to aposition so located can be targeted for mutagenesis. A mutated test zincfinger domain, for example, can be constrained at any mutated positionto a subset of possible amino acids by using a patterned degeneratelibrary. Degenerate codon sets can be used to encode the profile at eachposition. For example, codon sets are available that encode onlyhydrophobic residues, aliphatic residues, or hydrophilic residues. Thelibrary can be selected for full-length clones that encode foldedpolypeptides. Cho et al. ((2000) J. Mol. Biol. 297(2): 309-19) providesa method for producing such degenerate libraries using degenerateoligonucleotides, and also provides a method of selecting librarynucleic acids that encode full-length polypeptides. Such nucleic acidscan be easily inserted into an expression plasmid, e.g., usingconvenient restriction enzyme cleavage sites.

[0135] Selection of the appropriate codons and the relative proportionsof each nucleotide at a given position can be determined by simpleexamination of a table representing the genetic code, or bycomputational algorithms. For example, Cho et al., supra, describe acomputer program that accepts a desired profile of protein sequence andoutputs a preferred oligonucleotide design that encodes the sequence.

[0136] See also Zhang et al., (2000) J. Biol. Chem. 275:33850-33860;Rebar and Pabo (1994) Science 263:671-673; Segal (1999) Proc. Natl.Acad. Sci. USA 96:2758; Gogus et al., (1996) Proc. Natl. Acad. Sci. USA.93:2159-2164; Drier et al., (2001) J. Biol. Chem. 276: 29466-29478; Liuet al. (2001) J. Biol. Chem. 276(14): 11323-11334; and Hsuetal., (1992)Science 257:1946-50 for some available zinc finger domains.

[0137] In one embodiment, a chimeric protein can include one or more ofthe zinc finger domains that have at least 18, 19, 20, 21, 22, 23, 24,or 25 amino acids that are identical to a zinc finger domain sequence inTable 1 or Table 3, or are at least 70, 75, 80, 85, 90, or 95% identicalto a zinc finger domain sequence in Table 1 or Table 3. For example, theDNA contacting residues can be identical.

[0138] Construction of Chimeric Zinc Finger Proteins

[0139] A library of nucleic acids encoding diverse chimeric zinc fingerproteins can be formed by serial ligation, e.g., as described inExample 1. The library can be constructed such that each nucleic acidencodes a protein that has at least three, four, or five zinc fingerdomains. In some implementations, particularly for large libraries, eachzinc finger coding segment can be designed to randomly encode any one ofa set of zinc finger domains. The set of zinc finger domains can beselected to represent domains with a range of specificities, e.g.,covering 30, 40, 50 or more of the 64 possible 3-basepair subsites. Theset can include at least about 12, 15, 20, 25, 30, 40 or 50 differentzinc finger domains. Some or all of these domains can be domainsisolated from naturally occurring proteins. Moreover, because there maybe little or no need for more than one zinc finger domain for a given3-basepair subsite, it may be possible to generate a library using asmall number of component domains, e.g., less than 500, 200, 100, oreven less than 64 total component domains.

[0140] One exemplary library includes nucleic acids that encode achimeric zinc finger protein having three fingers and 30 possibledomains at each finger position. In its fully represented form, thislibrary includes 27,000 sequences (i.e., the result of 30³). The librarycan be constructed by serial ligation in which a nucleic acid from apool of nucleic acids encoding all 30 possible domains is added at eachstep.

[0141] In one embodiment, the library can be stored as a randomcollection. In another embodiment, individual members can be isolated,stored at an addressable location (e.g., arrayed), and sequenced. Afterhigh throughput sequencing of 40 to 50 thousand constructed librarymembers, missing chimeric combinations can be individually assembled inorder to obtain complete coverage. Once arrayed, e.g., in microtitreplates, each individual member can be recovered later for furtheranalysis, e.g., for a phenotypic screen. For example, equal amounts ofeach arrayed member can be pooled and then transformed into a cell.Cells with a desired phenotype are selected and characterized. Inanother example, each member is individually transformed into a cell,and the cell is characterized, e.g., using a nucleic acid microarray todetermine if the transcription of endogenous genes is altered (see“Profiling Regulatory Properties of a Chimeric Zinc Finger Protein,”below).

[0142] Introducing Nucleic Acid Libraries into Cells

[0143] Library nucleic acids can be introduced into cells by a varietyof methods. In one example, the library is stored as a random poolincluding multiple replicates of each library nucleic acid. An aliquotof the pool is transformed into cells. In another embodiment, individuallibrary members are stored separately (e.g., in separate wells of amicrotitre plate or at separate addresses of an array) and areindividually introduced into cells.

[0144] In still another embodiment, the library members are stored inpools that have a reduced complexity relative to the library as a whole.For example, each pool can include 10³ different library members from alibrary of 10⁵ or 10⁶ different members. When a pool is identified ashaving a member that causes a particular effect, the pool is deconvolvedto identify the individual library member that mediates the phenotypiceffect. This approach is useful when recovery of the altered cell isdifficult, e.g., in a screen for chimeric proteins that cause apoptosis.

[0145] Library nucleic acids can be introduced into cells by a varietyof methods. Exemplary methods include electroporation (see, e.g., U.S.Pat. No. 5,384,253); microprojectile bombardment techniques (see, e.g.,U.S. Pat. Nos. 5,550,318; 5,538,880; and 5,610,042; and WO 94/09699);liposome-mediated transfection (e.g., using LIPOFECTAMINE™ (Invitrogen)or SUPERFECT™ (QIAGEN GmbH); see, e.g., Nicolau et al., MethodsEnzymol., 149:157-176, 1987.); calcium phosphate or DEAE-Dextranmediated transformation (see, e.g., Rippe et al., (1990) Mol. CellBiol., 10:689-695); direct microinjection or sonication loading;receptor mediated transfection (see, e.g., EP 273 085); andAgrobacterium-mediated transformation (see, e.g., U.S. Pat. Nos.5,563,055 and 5,591,616). The term “transform,” as used herein,encompasses any method that introduces an exogenous nucleic acid into acell.

[0146] It is also possible to use a viral particle to deliver a librarynucleic acid into a cell in vitro or in vivo. In one embodiment, viralpackaging is used to deliver the library nucleic acids to cells withinan organism. In another embodiment, the library nucleic acids areintroduced into cells in vitro, after which the cells are transferredinto an organism.

[0147] After introduction of the library nucleic acids, the librarynucleic acids are expressed so that the chimeric proteins encoded by thelibrary are produced by the cells. Constant regions of the librarynucleic acid can provide necessary regulatory and supporting sequencesto enable expression. Such sequences can include transcriptionalpromoters, transcription terminators, bacterial origins of replication,markers for indicating the presence of the library nucleic acid or forselection of the library nucleic acid.

[0148] Screening Nucleic Acid Libraries Encoding Chimeric Proteins

[0149] In a screen, the cells are evaluated to identify ones that havean altered phenotype. This process can be adapted to the phenotype ofinterest. As the number of possible phenotypes is vast, so too are thepossibilities for screening. Numerous genetic screens and selectionshave been conducted to identify mutants or overexpressed naturallyoccurring genes that result in particular phenotypes. Any of thesemethods can be adapted to identify useful members of a nucleic acidlibrary encoding chimeric proteins. A screen can include evaluating eachcell that includes a library nucleic acid or a selection, e.g.,evaluating cells or organisms that survive or otherwise withstand aparticular treatment.

[0150] Exemplary methods for evaluating cells include microscopy (e.g.,light, confocal, fluorescence, scanning electron, and transmissionelectron), fluorescence based cell sorting, differential centrifugation,differential binding, immunoassays, enzymatic assays, growth assays, andin vivo assays.

[0151] Some screens involve particular environmental conditions. Cellsthat are sensitive or resistant to the condition are identified.

[0152] Some screens require detection of a particular behavior of a cell(e.g., morphological changes). In one embodiment, the cells or organismscan be evaluated directly, e.g., by visual inspection, e.g., using amicroscope and optionally computer software to automatically detectaltered cells. In another embodiment, the cells or organisms can beevaluated using an assay or other indicator associated with the desiredphenotype.

[0153] Some screens relate to cell growth. Cells that multiply at adifferent rate relative to a reference cell (e.g., a normal cell) areidentified.

[0154] Changes in cell signaling pathways can be detected by the use ofprobes correlated with activity or inactivity of the pathway or byobservable indications correlated with activity or inactivity of thepathway.

[0155] Some screens relate to production of a compound of interest,e.g., a metabolite, or a secreted protein. For example, cells can beidentified that produce an increased amount of a compound. In anotherexample, cells can be identified that produce a reduced amount of acompound, e.g., an undesired byproduct. Cells of interest can beidentified by a variety of means, including the use of a responder cell,microarrays, chemical detection assays, and immunoassays.

[0156] Production of Cellular Products.

[0157] The invention features artificial polypeptides (e.g., chimericzinc finger proteins) that alter the ability of a cell to produce acellular product, e.g., a protein or metabolite. A cellular product canbe an endogenous or heterologous molecule. For example, it is possibleto identify an artificial polypeptide that increases the ability of acell to produce proteins, e.g., particular proteins (e.g., particularendogenous proteins), overexpressed proteins, or heterologous proteins.

[0158] In one embodiment, cells are screened for their ability toproduce a reporter protein, e.g., a protein that can be enzymatically orfluorescently detected. In one example, the reporter protein isinsoluble when overexpressed in a reference cell. For example, bacterialcells can be screened for artificial polypeptides that reduce inclusionbodies. In another example, the reporter protein is secreted. Cells canbe screened, e.g., for higher secretory through-put or proteolyticprocessing.

[0159] In one embodiment, cells are screened for their ability to alter(e.g., increase or decrease) the activity of two different reporterproteins. The reporter proteins may differ, e.g., by activity,localization (e.g., secreted/cytoplasmic/nuclear), size, solubility,isoelectric point, oligomeric state, post-translational regulation,translational regulation, and transcriptional regulation (e.g., the geneencoding them may be regulated by different regulatory sequences). Theinvention includes artificial polypeptides (e.g., zinc finger proteins)that alter at least two different reporter genes that differ by theseproperties, and zinc finger proteins that selectively regulate areporter gene, or a class of reporter genes defined by one of theseproperties.

[0160] Because the phenotypic screening method can be used to isolatethe artificial polypeptide, it is not necessary to know a priori how thezinc finger protein mediates increased protein production. Possiblemechanisms, which can be verified, include alteration of one or more ofthe following: translation machinery, transcript processing,transcription, secretion, protein degradation, stress resistance,catalytic activity, e.g., metabolite production. In one example, anartificial polypeptide may modulate expression of one or more enzymes ina metabolic pathway and thereby enhance production of a cellular productsuch as a metabolite or a protein.

[0161] Iterative Design

[0162] Once a chimeric DNA binding protein is identified, its ability toalter a phenotypic trait of a cell can be further improved by a varietyof strategies. Small libraries, e.g., having about 6 to 200 or 50 to2000 members, or large libraries can be used to optimize the propertiesof a particular identified chimeric protein.

[0163] In a first exemplary implementation of an iterative design,mutagenesis techniques are used to alter the original chimeric DNAbinding protein. The techniques are applied to construct a secondlibrary whose members include members that are variants of an originalprotein, for example, a protein identified from a first library.Examples of these techniques include: error-prone PCR (Leung et al.(1989) Technique 1:11-15), recombination, DNA shuffling using randomcleavage (Stemmer (1994) Nature 389-391), Coco et al. (2001) NatureBiotech. 19:354, site-directed mutagenesis (Zollner et al. (1987) NuclAcids Res 10:6487-6504), cassette mutagenesis (Reidhaar-Olson (1991)Methods Enzymol. 208:564-586) incorporation of degenerateoligonucleotides (Griffiths et al. (1994) EMBO J. 13:3245); serialligation, pooling specific library members from a prefabricated andarrayed library, recombination (e.g., sexual PCR and “DNA Shuffling™”(Maxygen, Inc., CA)), or by combinations of these methods.

[0164] In one embodiment, a library is constructed that mutates a set ofamino acid positions. For example, for a chimeric zinc finger protein,the set of amino acid positions may be positions in the vicinity of theDNA contacting residues, but not the DNA contacting residues themselves.In another embodiment, the library varies each encoded domain in achimeric protein, but to a more limited extent than the initial libraryfrom which the chimeric DNA binding protein was identified. For achimeric zinc finger protein, the nucleic acids that encode a particulardomain can be varied among other zinc finger domains whose recognitionspecificity is known to be similar to that of the domain present in theoriginal chimeric protein.

[0165] Some techniques include generating new chimeric DNA bindingproteins from nucleic acids encoding domains of at least two chimericDNA binding proteins that are known to have a particular functionalproperty. These techniques, which include DNA shuffling and standarddomain swapping, create new combinations of domains. See, e.g., U.S.Pat. No. 6,291,242. DNA shuffling can also introduce point mutations inaddition to merely exchanging domains. The shuffling reaction is seededwith nucleic acid sequences encoding chimeric proteins that induces adesired phenotype. The nucleic acids are shuffled. A secondary libraryis produced from the shuffling products and screened for members thatinduce the desired phenotype, e.g., under similar or more stringentconditions. If the initial library is comprehensive such that chimerasof all possible domain combinations are screened, DNA shuffling ofdomains isolated from the same initial library may be of no avail. DNAshuffling may be useful in instances where coverage is comprehensive andalso in instances where comprehensive screening may not be practical.

[0166] In a second exemplary implementation of an iterative design, achimeric DNA binding protein that produces a desired phenotype isaltered by varying each domain. Domains can be varied sequentially,e.g., one-by-one, or greater than one at a time.

[0167] The following example refers to an original chimeric protein thatincludes three zinc finger domains: fingers I, II, and III and thatproduces a desired phenotype. A second library is constructed such thateach nucleic acid member of the second library encodes the same fingerII and finger III as the initially identified protein. However, thelibrary includes nucleic acid members whose finger I differs from fingerI of the original protein. The difference may be a single nucleotidethat alters the amino acid sequence of the encoded chimeric protein ormay be more substantial. The second library can be constructed, e.g.,such that the base-contacting residues of finger I are varied, or thatthe base-contacting residues of finger I are maintained but thatadjacent residues are varied. The second library can also to include alarge enough set of zinc finger domains to recognize at least 20, 30,40, or 60 different trinucleotide sites.

[0168] The second library is screened to identify members that alter aphenotype of a cell or organism. The extent of alteration can be similarto that produced by the original protein or greater than that producedby the original protein.

[0169] Concurrently, or subsequently, a third library can be constructedthat varies finger II, and a fourth library can be constructed thatvaries finger III. It may not be necessary to further improve a chimericprotein by varying all domains, if the chimeric protein or alreadyidentified variants are sufficient. In other cases, it is desirable tore-optimize each domain.

[0170] If other domains are varied concurrently, improved variants fromeach particular library can be recombined with each other to generatestill another library. This library is similarly screened.

[0171] In a third exemplary implementation of an iterative design, themethod includes adding, substituting, or deleting a domain, e.g., a zincfinger domain or a regulatory domain. An additional zinc finger domainmay increase the specificity of a chimeric protein and may increase itsbinding affinity. In some cases, increased binding affinity may enhancethe phenotype that the chimeric protein produces. An additionalregulatory domain, e.g., a second activation domain or a domain thatrecruits an accessory factor, may also enhance the phenotype that thechimeric protein produces. A deletion may improve or broaden thespecificity of the activity of the chimeric protein, depending on thecontribution of the domain that is deleted, and so forth.

[0172] In a fourth exemplary implementation of an iterative design, themethod includes co-expressing the original chimeric protein and a secondchimeric DNA binding protein in a cell. The second chimeric protein canbe also identified by screening a nucleic acid library that encodesdifferent chimeras. In one embodiment, the second chimeric protein isidentified by screening the library in a cell that expresses theoriginal chimeric protein. In another embodiment, the second chimericprotein is identified independently.

[0173] Profiling Regulatory Properties of a Chimeric Zinc Finger Protein

[0174] A chimeric polypeptide that alters a phenotype of a cell can befurther characterized to identify the endogenous genes that it directlyor indirectly regulates. Typically, the chimeric polypeptide is producedwithin the cell. At an appropriate time, e.g., before, during, or afterthe phenotypic change occurs, the cell is analyzed to determine thelevels of transcripts or proteins present in the cell or in the mediumsurrounding the cell. For example, mRNA can be harvested from the celland analyzed using a nucleic acid microarray.

[0175] Nucleic acid microarrays can be fabricated by a variety ofmethods, e.g., photolithographic methods (see, e.g., U.S. Pat. No.5,510,270), mechanical methods (e.g., directed-flow methods as describedin U.S. Pat. No. 5,384,261), and pin based methods (e.g., as describedin U.S. Pat. No. 5,288,514). The array is synthesized with a uniquecapture probe at each address, each capture probe being appropriate todetect a nucleic acid for a particular expressed gene.

[0176] Methods for isolating prokaryotic and eukaryotic RNAs are known.Isolated RNAs can be reverse-transcribed and optionally amplified, e.g.,by rtPCR, e.g., as described in (U.S. Pat. No. 4,683,202). The nucleicacid can be labeled during amplification or reverse transcription, e.g.,by the incorporation of a labeled nucleotide. Examples of preferredlabels include fluorescent labels, e.g., red-fluorescent dye Cy5(Amersham) or green-fluorescent dye Cy3 (Amersham). Alternatively, thenucleic acid can be labeled with biotin, and detected afterhybridization with labeled streptavidin, e.g.,streptavidin-phycoerythrin (Molecular Probes).

[0177] The labeled nucleic acid is then contacted to the array. Inaddition, a control nucleic acid or a reference nucleic acid can becontacted to the same array. The control nucleic acid or referencenucleic acid can be labeled with a label other than the sample nucleicacid, e.g., one with a different emission maximum. Labeled nucleic acidsare contacted to an array under hybridization conditions. The array iswashed, and then imaged to detect fluorescence at each address of thearray.

[0178] A general scheme for producing and evaluating profiles includesdetecting hybridization at each address of the array. The extent ofhybridization at an address is represented by a numerical value andstored, e.g., in a vector, a one-dimensional matrix, or one-dimensionalarray. The vector x has a value for each address of the array. Forexample, a numerical value for the extent of hybridization at aparticular address is stored in variable x_(a). The numerical value canbe adjusted, e.g., for local background levels, sample amount, and othervariations. Nucleic acid is also prepared from a reference sample andhybridized to the same or a different array. The vector y is constructidentically to vector x. The sample expression profile and the referenceprofile can be compared, e.g., using a mathematical equation that is afunction of the two vectors. The comparison can be evaluated as a scalarvalue, e.g., a score representing similarity of the two profiles. Eitheror both vectors can be transformed by a matrix in order to add weightingvalues to different genes detected by the array.

[0179] The expression data can be stored in a database, e.g., arelational database such as a SQL database (e.g., Oracle or Sybasedatabase environments). The database can have multiple tables. Forexample, raw expression data can be stored in one table, wherein eachcolumn corresponds to a gene being assayed, e.g., an address or anarray, and each row corresponds to a sample. A separate table can storeidentifiers and sample information, e.g., the batch number of the arrayused, date, and other quality control information.

[0180] Genes that are similarly regulated can be identified byclustering expression data to identify coregulated genes. Such clustermay be indicative of a set of genes coordinately regulated by thechimeric zinc finger protein. Genes can be clustered using hierarchicalclustering (see, e.g., Sokal and Michener (1958) Univ. Kans. Sci. Bull.38:1409), Bayesian clustering, k-means clustering, and self-organizingmaps (see, Tamayo et al. (1999) Proc. Natl. Acad. Sci. USA 96:2907).

[0181] The similarity of a sample expression profile to a referenceexpression profile (e.g., a control cell) can also be determined, e.g.,by comparing the log of the expression level of the sample to the log ofthe predictor or reference expression value and adjusting the comparisonby the weighting factor for all genes of predictive value in theprofile.

[0182] Proteins can also be profiled in a cell that has an activechimeric protein with in it. One exemplary method for profiling proteinsincludes 2-D gel electrophoresis and mass spectroscopy to characterizeindividual protein species. Individual “spots” on the 2-D gel areproteolyzed and then analyzed on the mass spectrometer. This method canidentify both the protein component and, in many cases, translationalmodifications.

[0183] The protein and nucleic acid profiling methods can not onlyprovide information about the properties of the chimeric protein, butalso information about natural mechanisms operating within the cell. Forexample, the proteins or nucleic acids upregulated by expression of thechimeric protein may be the natural effectors of the phenotypic changecaused by expression of the chimeric protein.

[0184] In addition, other methods can be used to identify target genesand proteins that are directly or indirectly regulated by the artificialchimeric protein. In one example, alterations that compensate (e.g.,suppress) the phenotypic effect of the artificial chimeric protein arecharacterized. These alterations include genetic alterations such asmutations in chromosomal genes and overexpression of a particular gene,as well as other alterations.

[0185] In a particular example, a chimeric ZFP is isolated that causes agrowth defect or lethality when conditionally expressed in a cell, e.g.,a pathogenic bacteria. Such a ZFP can be identified by transforming thecell with the ZFP libraries that include nucleic acids encoding ZFPs,expression of the nucleic acids being controlled by an induciblepromoter. Transformants are cultured on non-inducible media and thenreplica-plated on both inducible and non-inducible plates. Colonies thatgrow normally on non-inducible plate, but show defective growth oninducible plate are identified as “conditional lethal” or “conditionalgrowth defective” colonies.

[0186] (a) Identification of Target Genes using a cDNA Library

[0187] A cDNA expression library is then transformed into the“conditional lethal” or “conditional growth defective” strains describedabove. Transformants are plated on inducible plates. Colonies thatsurvive, despite the presence and expression of the ZFP that causes thedefect, are isolated. The nucleic acid sequences of cDNAs thatcomplement the defect are characterized. These cDNA can be transcriptsof direct or indirect target genes that are regulated by chimeric ZFPthat mediates the defect.

[0188] (b) Identification of Target Genes using a Secondary ZFP Library

[0189] A second chimeric protein that suppresses the effect of the firstchimeric protein is identified. The targets of the second chimericprotein (in the presence or absence of the first chimeric protein) areidentified.

[0190] For example, a ZFP library is transformed into “conditionallethal” or “conditional growth defective” colonies (which include afirst chimeric ZFP that causes the defect). Transformants are plated oninducible plates. Colonies that can survive by the expression ofintroduced ZFP are identified as “suppressed strains”. Target genes ofthe second ZFPs can be characterized by DNA microarray analysis. Thecomparative analysis can be done between four strains: 1) no ZFP; 2) thefirst ZFP alone; 3) the second ZFP alone; and 4) the first and secondZFP. For example, genes that are regulated in opposing directions by thefirst and second chimeric ZFPs are candidates for targets that mediatethe growth-defective phenotype. This method can be applied to anyphenotype, not just a growth defect.

[0191] (c) Co-Regulated Genes Identified by Expression ProfilingAnalysis

[0192] A candidate target of a chimeric ZFP can be identified byexpression profiling. Subsequently, to determine if the candidate targetmediates the phenotype of the chimeric ZFP, the candidate target can beindependently over-expressed or inhibited (e.g., by genetic deletion).In addition, it may be possible to apply this analysis to multiplecandidate targets since in at least some cases more than one candidatemay need to be perturbed to cause the phenotype.

[0193] (d) Time-Course Analysis

[0194] The targets of a chimeric ZFP can be identified by acharacterizing changes in gene expression with respect to time after acell is exposed to the chimeric ZFP. For example, a gene encoding thechimeric ZFP can be attached to an inducible promoter. An exemplaryinducible promoter is regulated by a small molecule such as doxycycline.The gene encoding the chimeric ZFP is introduced into cells. mRNAsamples are obtained from cells at various times after induction of theinducible promoter.

[0195] Target DNA Site Identification

[0196] With respect to chimeric DNA binding proteins, a variety ofmethods can be used to determine the target site of a chimeric DNAbinding protein that produces a phenotype of interest. Such methods canbe used, alone or in combination, to find such a target site.

[0197] In one embodiment, information from expression profile is used toidentify the target site recognized by a chimeric zinc finger protein.The regulatory regions of genes that are co-regulated by the chimericzinc finger protein are compared to identify a motif that is common toall or many of the regulatory regions.

[0198] In another embodiment, biochemical means are used to determinewhat DNA site is bound by the chimeric zinc finger protein. For example,chromatin immuno-precipitation experiments can be used to isolatenucleic acid to which the chimeric zinc finger protein is bound. Theisolated nucleic acid is PCR amplified and sequence. See, e.g., Gogus etal. (1996) Proc. Natl. Acad. Sci. USA. 93:2159-2164. The SELEX method isanother exemplary method that can be used. Further, information aboutthe binding specificity of individual zinc finger domains in thechimeric zinc finger protein can be used to predict the target site. Theprediction can be validated or can be used to guide interpretation ofother results (e.g., from chromatin immunoprecipitation, in silicoanalysis of co-regulated genes, and SELEX).

[0199] In still another embodiment, a potential target site is inferredbased on information about the binding specificity of each componentzinc finger. For example, the domains CSNR, RSNR, and QSNR have thefollowing respective DNA binding specificities GAC, GAG, and GAA. Theexpected target site is formed by considering the domains in C terminalto N-terminal order and concatenating their recognition specificities toobtain one strand of the target site in 5′ to 3′ order.

[0200] Although in most cases, chimeric zinc finger proteins are likelyto function as transcriptional regulators, it is possible that in somecases the chimeric zinc finger proteins mediate their phenotypic effectby binding to an RNA or protein target. Some naturally-occurring zincfinger proteins in fact bind to these macromolecules.

[0201] Additional Features of Zinc Finger Proteins

[0202] In addition to one, two, three, four, or more zinc fingerdomains, artificial polypeptides may optionally include a regulatorydomain, or other features described herein. Regulatory domains includeactivation domains and repression domains. In bacteria, activationdomain function can be emulated by a domain that recruits a wild-typeRNA polymerase alpha subunit C-terminal domain or a mutant alpha subunitC-terminal domain, e.g., a C-terminal domain fused to a proteininteraction domain. Bacterial activation domains include bacteriophageT4 Gp45-Gp55 complex, class II catabolite activator protein, also knownas CRP, and bacteriophage Mu Mor protein (see also Hochschild and Dove,Cell. 92: 597-600, 1998). Bacterial repression domains also, in manycases, also act by binding a C-terminal domain of an RNA polymerasealpha subunit (Hochschild and Dove, Cell. 92: 597-600, 1998).

[0203] Peptide Linkers. Zinc finger domains can be connected by avariety of linkers. The utility and design of linkers are well known inthe art. A particularly useful linker is a peptide linker that isencoded by nucleic acid. Thus, one can construct a synthetic gene thatencodes a first DNA binding domain, the peptide linker, and a second DNAbinding domain. This design can be repeated in order to construct large,synthetic, multi-domain DNA binding proteins. PCT WO 99/45132 and Kimand Pabo ((1998) Proc. Natl. Acad. Sci. USA 95:2812-7) describe thedesign of peptide linkers suitable for joining zinc finger domains.

[0204] Additional peptide linkers are available that form random coil,α-helical or β-pleated tertiary structures. Polypeptides that formsuitable flexible linkers are well known in the art (see, e.g., Robinsonand Sauer (1998) Proc Natl Acad Sci USA. 95:5929-34). Flexible linkerstypically include glycine, because this amino acid, which lacks a sidechain, is unique in its rotational freedom. Serine or threonine can beinterspersed in the linker to increase hydrophilicity. In additional,amino acids capable of interacting with the phosphate backbone of DNAcan be utilized in order to increase binding affinity. Judicious use ofsuch amino acids allows for balancing increases in affinity with loss ofsequence specificity. If a rigid extension is desirable as a linker,α-helical linkers, such as the helical linker described in Pantoliano etal. (1991) Biochem. 30:10117-10125, can be used. Linkers can also bedesigned by computer modeling (see, e.g., U.S. Pat. No. 4,946,778).Software for molecular modeling is commercially available (e.g., fromMolecular Simulations, Inc., San Diego, Calif.). The linker isoptionally optimized, e.g., to reduce antigenicity and/or to increasestability, using standard mutagenesis techniques and appropriatebiophysical tests as practiced in the art of protein engineering, andfunctional assays as described herein.

[0205] For implementations utilizing zinc finger domains, the peptidethat occurs naturally between zinc fingers can be used as a linker tojoin fingers together. A typical such naturally occurring linker is:Thr-Gly-(Glu or Gln)-(Lys or Arg)-Pro-(Tyr or Phe) (SEQ ID NO:124).

[0206] Dimerization Domains. An alternative method of linking DNAbinding domains is the use of dimerization domains, especiallyheterodimerization domains (see, e.g., Pomerantz et al (1998)Biochemistry 37:965-970). In this implementation, DNA binding domainsare present in separate polypeptide chains. For example, a firstpolypeptide encodes DNA binding domain A, linker, and domain B, while asecond polypeptide encodes domain C, linker, and domain D. An artisancan select a dimerization domain from the many well-characterizeddimerization domains. Domains that favor heterodimerization can be usedif homodimers are not desired. A particularly adaptable dimerizationdomain is the coiled-coil motif, e.g., a dimeric parallel oranti-parallel coiled-coil. Coiled-coil sequences that preferentiallyform heterodimers are also available (Lumb and Kim, (1995) Biochemistry34:8642-8648). Another species of dimerization domain is one in whichdimerization is triggered by a small molecule or by a signaling event.For example, a dimeric form of FK506 can be used to dimerize two FK506binding protein (FKBP) domains. Such dimerization domains can beutilized to provide additional levels of regulation.

[0207] Expression of Zinc Finger Proteins

[0208] Method described herein can include use of routine techniques inthe field of molecular biology, biochemistry, classical genetics, andrecombinant genetics. Basic texts disclosing the general methods of usein this invention include Sambrook et al., Molecular Cloning, ALaboratory Manual (2nd ed. 1989); Kriegler, Gene Transfer andExpression: A Laboratory Manual (1990); and Current Protocols inMolecular Biology (Ausubel et al., eds., 1994)).

[0209] In addition to other methods described herein, nucleic acidsencoding zinc proteins can be constructed using syntheticoligonucleotides as linkers to construct a synthetic gene. In anotherexample, synthetic oligonucleotides are used and/or primers to amplifysequences encoding one or more zinc finger domains, e.g., from an RNA orDNA template, artificial or synthetic. See U.S. Pat. Nos. 4,683,195 and4,683,202; PCR Protocols: A Guide to Methods and Applications (Innis etal., eds, 1990)). Methods such as polymerase chain reaction (PCR) can beused to amplify nucleic acid sequences directly from mRNA, from cDNA,from genomic, cDNA, or zinc finger protein libraries. Degenerateoligonucleotides can be designed to amplify homologs using the sequencesprovided herein. Restriction endonuclease sites can be incorporated intothe primers.

[0210] Gene expression of zinc finger proteins can also be analyzed bytechniques known in the art, e.g., reverse transcription andamplification of mRNA, isolation of total RNA or polyA⁺ RNA, northernblotting, dot blotting, in situ hybridization, RNase protection, nucleicacid array technology, e.g., and the like.

[0211] The polynucleotide encoding an artificial zinc finger protein canbe cloned into vectors before transformation into prokaryotic oreukaryotic cells for replication and/or expression. These vectors aretypically prokaryote vectors, e.g., plasmids, phage or shuttle vectors,or eukaryotic vectors.

[0212] Protein Expression. To obtain recombinant expression (e.g., highlevel) expression of a polynucleotide encoding an artificial zinc fingerprotein, one can subclone the relevant coding nucleic acids into anexpression vector that contains a strong promoter to directtranscription, a transcription/translation terminator, and a ribosomebinding site for translational initiation. Suitable bacterial promotersare well known in the art and described, e.g., in Sambrook et al., andAusubel et al, supra. Bacterial expression systems for expression areavailable in, e.g., E. coli, Bacillus sp., and Salmonella (Palva et al.,(1983) Gene 22:229-235; Mosbach et al., (1983) Nature 302:543-545. Kitsfor such expression systems are commercially available. Eukaryoticexpression systems for mammalian cells, yeast (e.g., S. cerevisiae, S.pombe, Pichia, and Hanseula), and insect cells are well known in the artand are also commercially available.

[0213] Selection of the promoter used to direct expression of aheterologous nucleic acid depends on the particular application. Thepromoter is preferably positioned about the same distance from theheterologous transcription start site as it is from the transcriptionstart site in its natural setting. As is known in the art, however, somevariation in this distance can be accommodated without loss of promoterfunction.

[0214] A nucleic acid sequence encoding a chimeric zinc finger proteincan be cloned into a vector that will permit regulatable expression ofthe artificial polypeptide, e.g., an inducible expression vector asdescribed in Kang and Kim, (2000) J Biol Chem 275:8742. The inducibleexpression vector can include a regulatable promoter or regulatorysequence. A useful promoter or sequence for controlling expression of anartificial polypeptide is one that is selectively activated or repressedin certain conditions. Regulatable promoters include promotersresponsive to an environmental parameter, e.g., thermal changes,hormones, metals, metabolites, antibiotics, or chemical agents. Bymodulating the concentration of an agent that can regulate the promoteror sequence, the expression of the target prokaryotic gene (e.g., theendogenous gene) can be regulated in a concentration dependent manner.

[0215] Regulatable promoters appropriate for use in E. coli includepromoters which contain transcription factor binding sites from the lac,tac, trp, trc, and tet operator sequences, or operons, the alkalinephosphatase promoter (pho), an arabinose promoter such as an araBADpromoter, the rhamnose promoter, the promoters themselves, or functionalfragments thereof (see, e.g., Elvin et al., 1990, Gene 37: 123-126;Tabor and Richardson, 1998, Proc. Natl. Acad. Sci. U.S.A. 1074-1078;Chang et al., 1986, Gene 44: 121-125; Lutz and Bujard, March 1997, Nucl.Acids. Res. 25: 1203-1210; D. V. Goeddel et al., Proc. Nat. Acad. Sci.U.S.A., 76:106-110, 1979; J. D. Windass et al. Nucl. Acids. Res.,10:6639-57, 1982; R. Crowl et al., Gene, 38:31-38, 1985; Brosius, 1984,Gene 27: 161-172; Amanna and Brosius, 1985, Gene 40: 183-190; Guzman etal., 1992, J. Bacteriol., 174: 7716-7728; Haldimann et al., 1998, J.Bacteriol., 180: 1277-1286). Inducible promoter systems such as lacpromoters may be bound by repressor or inducer molecules. Lac promotersare induced by lactose or structurally related molecules such asisopropyl-beta-D-thiogalactoside (IPTG) and are repressed by glucose.Some inducible promoters are induced by a process of derepression, e.g.,inactivation of a repressor molecule.

[0216] A regulatable promoter sequence can also be indirectly regulated.Examples of promoters that can be engineered for indirect regulationinclude: the phage lambda PR, —PL, phage T7, SP6, and T5 promoters. Forexample, the regulatory sequence is repressed or activated by a factorwhose expression is regulated, e.g., by an environmental parameter. Oneexample of such a promoter is a T7 promoter. The expression of the T7RNA polymerase can be regulated by an environmentally-responsivepromoter such as the lac promoter. For example, the cell can include anartificial nucleic acid that includes a sequence encoding the T7 RNApolymerase and a regulatory sequence (e.g., the lac promoter) that isregulated by an environmental parameter (Studier, F. W., and Moffatt, B.A. J. Mol. Biol. 189(1): 113-30, 1986). The activity of the T7 RNApolymerase can also be regulated by the presence of a natural inhibitorof RNA polymerase, such as T7 lysozyme (Studier, F. W. J. Mol. Biol.219(1): 37-44, 1991).

[0217] In addition to the promoter, the expression vector typicallycontains a transcription unit or expression cassette that contains allthe additional elements required for expression in host cells. A typicalexpression cassette thus contains a promoter operably linked to thecoding nucleic acid sequence and signals appropriate for efficientexpression in the host cell type, e.g., polyadenylation of thetranscript, ribosome binding sites, and translation termination.Additional elements of the cassette, e.g., for expression in eukaryotes,may include enhancers and, if genomic DNA is used as the structuralgene, introns with functional splice donor and acceptor sites.

[0218] In addition to a promoter sequence, the expression cassetteshould also contain a transcription termination region downstream of thestructural gene to provide for efficient termination. The terminationregion may be obtained from the same gene as the promoter sequence ormay be obtained from different genes.

[0219] The particular expression vector used to transport the geneticinformation into the cell is not particularly critical. Any of theconventional vectors used for expression in eukaryotic or prokaryoticcells may be used. Standard bacterial expression vectors includeplasmids such as pBR322 based plasmids, pSKF, pET23D, and fusionexpression systems such as MBP, GST, and LacZ. Epitope tags can also beadded to recombinant proteins to provide convenient methods ofisolation, e.g., c-myc-, or a hexa-histidine tag.

[0220] Expression vectors can contain regulatory elements fromeukaryotic viruses, e.g., SV40 vectors, papilloma virus vectors, andvectors derived from Epstein-Barr virus. Other exemplary eukaryoticvectors include pMSG, pAV009/A⁺, pMTO10/A⁺, pMAMneo-5, baculoviruspDSVE, and any other vector allowing expression of proteins under thedirection of the CMV promoter, SV40 early promoter, SV40 later promoter,metallothionein promoter, murine mammary tumor virus promoter, Roussarcoma virus promoter, polyhedrin promoter, or other promoters showneffective for expression in eukaryotic cells.

[0221] Expression of proteins from eukaryotic vectors can be also beregulated using inducible promoters. With inducible promoters,expression levels are tied to the concentration of inducing agents, suchas tetracycline or ecdysone, by the incorporation of response elementsfor these agents into the promoter. Generally, high level expression isobtained from inducible promoters only in the presence of the inducingagent; basal expression levels are minimal. Inducible expression vectorsare often chosen if expression of the protein of interest is detrimentalto eukaryotic cells.

[0222] Some expression systems have markers that provide geneamplification such as thymidine kinase and dihydrofolate reductase.Alternatively, high yield expression systems not involving geneamplification are also suitable, such as using a baculovirus vector ininsect cells, with mitochondrial respiratory chain protein encodingsequences and glycolysis protein encoding sequence under the directionof the polyhedrin promoter or other strong baculovirus promoters

[0223] The elements that are typically included in expression vectorsalso include a replicon that functions in E. coli, a gene encodingantibiotic resistance to permit selection of bacteria that harborrecombinant plasmids, and unique restriction sites in nonessentialregions of the plasmid to allow insertion of eukaryotic sequences. Theprokaryotic sequences can be chosen such that they do not interfere withthe replication of the DNA in eukaryotic cells.

[0224] Standard transfection methods are used to produce bacterial,mammalian, yeast or insect cell lines that express large quantities ofzinc finger proteins, which are then purified using standard techniques(see, e.g., Colley et al., J. Biol. Chem. 264:17619-17622 (1989); Guideto Protein Purification, in Methods in Enzymology, vol. 182 (Deutscher,ed., 1990)). Transformation of eukaryotic and prokaryotic cells areperformed according to standard techniques (see, e.g., Morrison, J.Bact. 132:349-351 (1977); Clark-Curtiss & Curtiss, Methods in Enzymology101:347-362 (Wu et al., eds, 1983).

[0225] Any of the well-known procedures for introducing foreignnucleotide sequences into host cells may be used. These include the useof calcium phosphate transfection, protoplast fusion, electroporation,liposomes, microinjection, plasma vectors, viral vectors and any of theother well known methods for introducing cloned genomic DNA, cDNA,synthetic DNA or other foreign genetic material into a host cell (see,e.g., Sambrook et al., supra).

[0226] After the expression vector is introduced into the cells, thetransfected cells are cultured under conditions favoring expression oractivating expression. The protein can then be isolated from a cellextract, cell membrane component or vesicle, or media.

[0227] Expression vectors with appropriate regulatory sequences can alsobe used to express a heterologous gene encoding an artificial zincfinger in a model organism, e.g., a Drosophila, nematode, zebrafish,Xenopus, or mouse. See, e.g., Riddle et al., eds., C. elegans II.Plainview (NY): Cold Spring Harbor Laboratory Press; 1997.

[0228] Protein Purification. Zinc finger protein can be purified frommaterials generated by any suitable expression system, e.g., thosedescribed above.

[0229] Zinc finger proteins may be purified to substantial purity bystandard techniques, including selective precipitation with suchsubstances as ammonium sulfate; column chromatography, affinitypurification, immunopurification methods, and others (see, e.g., Scopes,Protein Purification: Principles and Practice (1982); U.S. Pat. No.4,673,641; Ausubel et al., supra; and Sambrook et al., supra). Forexample, zinc finger proteins can include an affinity tag that can beused for purification, e.g., in combination with other steps.

[0230] Recombinant proteins are expressed by transformed bacteria inlarge amounts, typically after promoter induction; but expression can beconstitutive. Promoter induction with IPTG is one example of aninducible promoter system. Bacteria are grown according to standardprocedures in the art. Fresh or frozen bacteria cells are used forisolation of protein. Proteins expressed in bacteria may form insolubleaggregates (“inclusion bodies”). Several protocols are suitable forpurifying proteins from inclusion bodies. See, e.g., Sambrook et al.,supra; Ausubel et al., supra). If the proteins are soluble or exportedto the periplasm, they can be obtained from cell lysates or periplasmicpreparations.

[0231] Differential Precipitation. Salting-in or out can be used toselectively precipitate a zinc finger protein or a contaminatingprotein. An exemplary salt is ammonium sulfate. Ammonium sulfateprecipitates proteins on the basis of their solubility. The morehydrophobic a protein is, the more likely it is to precipitate at lowerammonium sulfate concentrations. A typical protocol includes addingsaturated ammonium sulfate to a protein solution so that the resultantammonium sulfate concentration is between 20-30%. This concentrationprecipitates many of the more hydrophobic proteins. The precipitate isanalyzed to determine if the protein of interest is precipitated or inthe supernatant. Ammonium sulfate is added to the supernatant to aconcentration known to precipitate the protein of interest. Theprecipitate is then solubilized in buffer and the excess salt removed ifnecessary, either through dialysis or diafiltration.

[0232] Column chromatography. A zinc finger protein can be separatedfrom other proteins on the basis of its size, net surface charge,hydrophobicity, and affinity for ligands. In addition, antibodies raisedagainst proteins can be conjugated to column matrices and the proteinsimmunopurified. All of these methods are well known in the art.Chromatographic techniques can be performed at any scale and usingequipment from many different manufacturers (e.g., Pharmacia Biotech).See, generally, Scopes, Protein Purfication: Principles and Practice(1982).

[0233] Similarly general protein purification procedures can be used torecover a protein whose production is altered (e.g., enhanced) byexpression of an artificial zinc finger protein in a producing cell.

[0234] The invention also provides compositions, e.g., pharmaceuticallyacceptable compositions, which include an artificial polypeptide, e.g.,as described herein, or a nucleic acid encoding such a factor formulatedtogether with a pharmaceutically acceptable carrier.

[0235] As used herein, “pharmaceutically acceptable carrier” includesany and all solvents, dispersion media, coatings, antibacterial andantifungal agents, isotonic and absorption delaying agents, and the likethat are physiologically compatible. Preferably, the carrier is suitablefor intravenous, intramuscular, subcutaneous, parenteral, spinal orepidermal administration (e.g., by injection or infusion). Depending onthe route of administration, the active compound may be coated in amaterial to protect the compound from the action of acids and othernatural conditions that may inactivate the compound.

[0236] A “pharmaceutically acceptable salt” refers to a salt thatretains the desired biological activity of the parent compound and doesnot impart any undesired toxicological effects (see e.g., Berge, S. M.,et al. (1977) J. Pharm. Sci. 66:1-19). Examples of such salts includeacid addition salts and base addition salts. Acid addition salts includethose derived from nontoxic inorganic acids, such as hydrochloric,nitric, phosphoric, sulfuric, hydrobromic, hydroiodic, phosphorous andthe like, as well as from nontoxic organic acids such as aliphatic mono-and dicarboxylic acids, phenyl-substituted alkanoic acids, hydroxyalkanoic acids, aromatic acids, aliphatic and aromatic sulfonic acidsand the like. Base addition salts include those derived from alkalineearth metals, such as sodium, potassium, magnesium, calcium and thelike, as well as from nontoxic organic amines, such asN,N′-dibenzylethylenediamine, N-methylglucamine, chloroprocaine,choline, diethanolamine, ethylenediamine, procaine and the like.

[0237] The compositions may be in a variety of forms. These include, forexample, liquid, semi-solid and solid dosage forms, such as liquidsolutions (e.g., injectable and infusible solutions), dispersions orsuspensions, tablets, pills, powders, and liposomes.

[0238] The compositions can be administered by a variety of methodsknown in the art, although for many applications, the route/mode ofadministration is intravenous injection or infusion. For example, thecomposition can be administered by intravenous infusion at a rate ofless than 30, 20, 10, 5, or 1 mg/min to reach a dose of about 1 to 100mg/m² or 7 to 25 mg/m². The route and/or mode of administration willvary depending upon the desired results. Many methods for thepreparation of such formulations are patented or generally known. See,e.g., Sustained and Controlled Release Drug Delivery Systems, J. R.Robinson, ed., Marcel Dekker, Inc., New York, 1978.

[0239] Dosage regimens are adjusted to provide the optimum desiredresponse (e.g., a therapeutic response). For example, a single bolus maybe administered, several divided doses may be administered over time orthe dose may be proportionally reduced or increased as indicated by theexigencies of the therapeutic situation. It is especially advantageousto formulate parenteral compositions in dosage unit form for ease ofadministration and uniformity of dosage. Dosage unit form as used hereinrefers to physically discrete units suited as unitary dosages for thesubjects to be treated; each unit contains a predetermined quantity ofactive compound calculated to produce the desired therapeutic effect inassociation with the required pharmaceutical carrier. The specificationfor the dosage unit forms of the invention are dictated by and directlydependent on (a) the unique characteristics of the active compound andthe particular therapeutic effect to be achieved, and (b) thelimitations inherent in the art of compounding such an active compoundfor the treatment of sensitivity in individuals.

[0240] An exemplary, non-limiting range for a therapeutically orprophylactically effective amount of the protein or nucleic acid is0.1-20 mg/kg, more preferably 1-10 mg/kg. It is to be noted that dosagevalues may vary with the type and severity of the condition to bealleviated. It is to be further understood that for any particularsubject, specific dosage regimens should be adjusted over time accordingto the individual need and the professional judgment of the personadministering or supervising the administration of the compositions, andthat dosage ranges set forth herein are exemplary only and are notintended to limit the scope or practice of the claimed composition.

[0241] Cell-Based Therapeutics

[0242] Cell based-therapeutic methods include introducing a nucleic acidthat encoding the artificial zinc finger protein operably linked to apromoter into a cell. The artificial zinc finger protein can be selectedto regulate an endogenous gene in the culture cell or to produce adesired phenotype in the cultured cell. Further, it is also possible tomodify cells using nucleic acid recombination, to insert a gene encodingan artificial zinc finger protein that regulates an endogenous gene. Thecell can be administered to a subject.

[0243] In vivo administration generally can include administering apharmaceutical composition containing a therapeutically-effective amountof the modified bacteria. The therapeutically effective amount willdepend on the mode of administration and the strain of bacteria used.Generally, the therapeutically effective amount is an amount of bacteriasufficient to induce a desired response. In one embodiment, a givennumber of bacterial cells is administered. Bacteria can be administeredas a function of the number of colony forming units (CFU) of the strain.For example, between 1×10³ and 1×10¹¹ CFU of bacteria can beadministered per dose.

[0244] In one embodiment, bacteria are administered orally. See, e.g.,Angelakopoulos H, et al. Infect Immun. 70(7): 3592-601 (2002). Briefly,bacteria are cultured, pelleted by centrifugation and washed twice withnormal saline. The bacteria are resuspended at a specific turbidity foradminstration in normal saline or a solution that can buffer againstgastric acid (e.g., citrate buffer (pH 7.0) containing sucrose;bicarbonate buffer (pH 7.0) alone (Levine et al, J. Clin. Invest.,79:888-902 (1987); and Black et al J. Infect. Dis., 155:1260-1265(1987)), or bicarbonate buffer (pH 7.0) containing ascorbic acid,lactose, and optionally aspartame (Levine et al, Lancet, II:467-470(1988)). Alternatively, a buffer solution is ingested prior to ingestionof the bacteria. The bacteria can be formulated into a pharmaceuticalcomposition by combination with an appropriate pharmaceuticallyacceptable carrier. Appropriate carriers include proteins, e.g., asfound in skim milk, sugars, e.g., sucrose, or polyvinylpyrrolidone.Typically these carriers can be used at a concentration of about 0.1-90%(w/v), and preferably at a range of 1-10% (w/v). The bacteria can beused alone or in appropriate association, as well as in combination withother pharmaceutically active compounds. The bacteria can beadministered in combination with an adjuvant. The bacteria can beformulated into preparations in solid, semisolid, or liquid form such astablets, capsules, powders, granules, ointments, solutions,suppositories, and injections, in usual ways for topical, nasal, oral,parenteral, or surgical administration. Administration in vivo can beoral, mucosal nasal, bronchial, parenteral, subcutaneous, intravenous,intra-arterial, intramuscular, intra-organ, intra-tumoral, or surgical.Administration can include the use of an implantable container (e.g., abiodegradable or semipermeable shell, capsule, tube or other device fordelivery of the bacteria) that may optionally contain a matrix upon orinto which cells may be seeded. The route of administration can beselected as is appropriate for the targeted host cells. Target cells canalso be removed from the subject, treated ex vivo, and the cells thenreturned to the subject. Other exemplary methods for in vivoadministration are described in Shen et al., Proc Natl Acad Sci USA92(9):3987-3991, 1995; Jensen et al, Immunol Rev 158: 147-157, 1997;Szalay et al., Proc Natl Acad Sci USA 92(26):12389-12392, 1995; Belyi etal, FEMS Immunol Med Microbiol 13(3): 211-213, 1996; Frankel et al., J.Immunol 155(10):4775-4782, 1995; Goossens et al., Int Immunol7(5):797-805, 1995; Schafer et al., J. Immunol 149(1):53-59, 1992; andLinde et al., Vaccine 9(2): 101-105, 1991.

[0245] Target for Altered Protein Production

[0246] In one embodiment, a nucleic acid library is screened to identifyan artificial zinc finger protein that alters production, synthesis oractivity of one or more particular target proteins in a prokaryoticcell. The alteration can increase or decrease activity or abundance ofthe target protein. The phenotype screened for can be associated withaltered production or activity of one or more target proteins or can bethe level of production or activity itself. For example, it is possibleto screen a nucleic acid library for artificial polypeptides thatactivate or suppress expression of a reporter gene (such as thoseencoding luciferase, LacZ, or GFP) under the control of a regulatorysequence (e.g., the promoter) of an endogenous target gene.

[0247] The methods and compositions described herein can be applied toscreening any target gene or phenotype of interest. For example,bacterial cells can be screened for a given enzyme activity. Cellshaving an increased or decreased amount of an enzyme activity may beisolated. Bacterial enzymes for which overexpression may be desiredinclude oxidoreductases, transferases, hydrolases, lyases, isomerases,and ligases. Expression of zinc finger proteins may coordinatelymodulate expression of multiple genes, either due to the organization ofprokaryotic genes in operons, or by virtue of binding to multipleindependent sites. Accordingly, the methods may provide for complexeffects on expression of multiple genes.

[0248] The present invention will be described in more detail throughthe following examples. However, it should be noted that these examplesare not intended to limit the scope of the present invention.

EXAMPLE 1 Construction of ZFP Libraries

[0249] In one example, various phenotypes of E. coli are altered byregulating gene expression using zinc finger protein (ZFP) expressionlibraries. The zinc finger proteins in these exemplary libraries consistof three or four zinc finger domains (ZFDs) and recognize 9- to 12-bpDNA sequences respectively. The chimeric zinc finger protein isidentified without a priori knowledge of the target genes. We used 25different zinc finger domains as modular building blocks to constructproteins containing 3-finger or 4-finger zinc finger proteins. Theselibraries of ZFP expression plasmids were then transformed into E. coli.In each transformed cell, a different ZFP polypeptide is expressed andcan be assayed for regulation of unspecified target genes in the genome.This alteration of gene expression pattern can lead to phenotypicchanges. In addition, the regulated target genes can be identified bycombining in silico prediction of target DNA sequences with genomic DNAimmunoprecipitation after identifying zinc finger proteins introduced tothe transformants.

[0250] (1) E. coli Strain and Plasmids

[0251] The E. coli strain used for screening of various phenotypicchanges was DH5α. Strain DY330 (W3110 DlacU169 gal490 lc1857 D(cro-bioA)) was used for gene disruption by homologous recombination (Yuet al., Proc Natl Acad Sci USA. 97(11):5978-83, 2000). The parentalvector to construct libraries of zinc finger protein was plasmid p3. Theplasmid vector used for the expression of zinc finger protein in E. coliwas pZL1,

[0252] (2) Construction of Plasmid p3

[0253] The parental vector that we used to construct libraries of zincfinger proteins is the plasmid p3. p3 was constructed by modifying thepcDNA3 vector (Invitrogen, San Diego Calif.) as follows. The pcDNA3vector was digested with HindIII and XhoI. A synthetic oligonucleotideduplex with compatible overhangs was ligated into the digested pcDNA3.The duplex contains nucleic acid that encodes the hemagglutinin (HA) tagand a nuclear localization signal. The duplex also includes: restrictionsites for BamHI, EcORI, NotI, and BglII; and a stop codon. The XmaI sitein SV40 origin of the vector was destroyed by digestion with XmaI,filling in the overhanging ends of the digested XmaI restriction site,and religation of the ends.

[0254] (3) Construction of pZL1

[0255] We used pZL1 as the parental vector for conditional expression ofzinc finger proteins in E. coli. PZL1 was modified from pBT-LGF2(Clontech) to have V5 epitope and multiple cloning sites. The followingnucleic acid sequences were inserted into ClaI and NotI sites ofpBT-LGF2 to generate pZL1 plasmid. ATC GAT AAG CTA ATT CTC ACT CAT TAGGCA CCC CAG GCT TTA CAC (SEQ ID NO:124) TTT ATG CTT CCG GCT CGT ATA ATGTGT GGA ATT GTG AGC GGA TAA CAA TTT CAC ACA GGA AAC AGC GTC CAT GGG TAAGCC TAT CCC TAA CCC TCT CCT CGG TCT CGA TTC TAC ACA AGC TAT GGG TGC TCCTCC AAA AAA GAA GAG AAA GGT AGC TGG ATC CAC TAG TAA CGG CCG CCA GTG TGCTGG AAT TCT GCA GAT ATC CAT CAC ACT GGC GGC CGC

[0256] The library constructed in p3 was subcloned into into EcOR1 andNotI sites of pZL1 to generate ZFP libraries functioning in E. coli.

[0257] (4) Library construction

[0258] A three-fingered (the “3-F library”) or a four-fingered proteinlibrary (the “4-F library”) were constructed from nucleic acids encoding25 different ZFDs (Table 4, below). TABLE 4 Zinc finger domains forconstruction of 3-finger or 4-finger ZFP libraries Domain Name SourceTarget Sites Amino acid sequences SEQ ID NO: DSAR Mutated¹ GTCFMCTWSYCGKRFTDRSALARHKRTH 125 CSNR1 Human GAA > GAC > GAGYKCKQCGKAFGCPSNLRRHGRTH 126 DSCR Human GCC YTCSDCGKAFRDKSCLNRHRRTH 127DSNR Mutated² GAC YACPVESCDRRFSDSSNLTRHIRIH 128 HSSR Human GTTFKCPVCGKAFRHSSSLVRHQRTH 129 ISNR Human GAA > GAT > GACYRCKYCDRSFSISSNLQRHVRNIH 130 QFNR Human GAG YKCHQCGKAFIQSFNLRRHERTH 131QNTQ Drosophila ³ ATA YTCSYCGKSFTQSNTLKQHTRIH 132 QSHV Human CGA > AGA >TGA YECDHCGKSFSQSSHLNVHKRTH 133 QSNI Human AAA, CAAYMCSECGRGFSQKSNLIIHQRTH 134 QSNK Human GAA > TAA > AAAYKCEECGKAFTQSSNLTKHKKIH 135 QSNR1 Human GAA FECKDCGKAFIQKSNLIRHQRTH 136QSNV2 Human AAA, CAA YVCSKCGKAFTQSSNLTVHQKIH 137 QSSR1 Human GTA > GCAYKCPDCGKSFSQSSSLIRHQRTH 138 QTHQ Human CGA > TGA, AGAYECHDCGKSFRQSTHLTQHRRIH 139 QTHR1 Human GGA > AGA, GAA > TGA, CGAYECHDCGKSFRQSTHLTRHRRIH 140 RDHT Human TGG, AGG, CGG, GGGFQCKTCQRKFSRSDHLKTHTRTH 141 RDKR Human GGG > AGGYVCDVEGCTWKFARSDKLNRHKKRH 142 RDNQm Mutated⁴ AAG FACPECPKRFMRSDNLTQHIKTH143 RSHR Human GGG YKCMECGKAFNRRSHLTRHQRIH 144 RSNR Human GAG > GTGYICRKCGRGFSRKSNLIRHQRTH 145 VSNV Human AAT > CAT > TATYECDHCGKAFSVSSNLNVHRRIH 146 VSSR Human GTT > GCT > GTG > GTAYTCKQCGKAFSVSSSLRRHETTH 147 VSTR Human GCT > GCG YECNYCGKTFSVSSTLIRHQRIH148 WSNR Human GGT YRCEECGKAFRWPSNLTRHKRIH 149

[0259] Superscripts in column 2 of Table 4 refer to 1) Zhang et al.,(2000) J. Biol. Chem. 275:33850-33860; 2) Rebar and Pabo (1994) Science263:671-673; 3) Gogus et al., (1996) Proc. Natl. Acad. Sci. USA.93:2159-2164; 4) Liu et al. (2001) J. Biol. Chem. 276(14):11323-11334.The small letter m after the name of certain zinc finger domainsindicates that the domain obtained by mutation of a parental domain.

[0260] Nucleic acid fragments encoding each ZFD were individually clonedinto the p3 vector to form “single fingered” vectors. Equal amounts ofeach “single fingered” vector were combined to form a pool. One aliquotof the pool was digested with AgeI and XhoI to obtain digested vectorfragments. These vector fragments were treated with phosphatase for 30minutes. Another aliquot of the pool was digested with XmaI and XhoI toobtain segments encoding single fingers. The digested vector nucleicacids from the AgeI and XhoI digested pool were ligated to the nucleicacid segments released from the vector by the XmaI and XhoI digestion.The ligation generated vectors that each encode two zinc finger domains.After transformation into E. coli, approximately 1.4×10⁴ independenttransformants were obtained, thereby forming a two-fingered library. Thesize of the insert region of the two-fingered library was verified byPCR analysis of 40 colonies. The correct size insert was present in 95%of the library members.

[0261] To prepare a three-fingered library, DNA segments encoding onefinger were inserted into plasmids encoding two fingers. The 2-fingeredlibrary was digested with AgeI and XhoI. The digested plasmids, whichretain nucleic acid sequences encoding two zinc finger domains, wereligated to the pool of nucleic acid segments encoding a single finger(prepared as described above by digestion with XmaI and XhoI). Theproducts of this ligation were transformed into E. coli to obtain about2.4×10⁵ independent transformants. Verification of the insert regionconfirmed that library members predominantly included sequences encodingthree zinc finger domains.

[0262] To prepare a four-fingered library, DNA segments encoding twofingers were inserted into plasmids encoding two fingers. Thetwo-fingered library was digested with XmaI and XhoI to obtain nucleicacid segments that encode two zinc finger domains. The two-fingeredlibrary was also digested with AgeI and XhoI to obtain a pool ofdigested plasmids. The digested plasmids, which retain nucleic acidsequences encoding two zinc finger domains, were ligated to the nucleicacid segments encoding two zinc finger domains to produce a populationof plasmids encoding different combination of four fingered proteins.The products of this ligation were transformed into E. coli and yieldedabout 7×10⁶ independent transformants.

[0263] 3F- or 4F-ZFP inserts were subcloned into EcOR1 and NotI sites ofpZL1 vector to generate ZFP libraries functioning in E. coli.

EXAMPLE 2 Solvent Tolerant Bacterial Cells

[0264] We screened for bacterial cells that express artificial chimericzinc finger proteins for cells that were resistant to an organic solventas a result of the artificial chimeric zinc finger protein. The E. colistrain DH5α was transformed with the 3-finger or 4-finger ZFP nucleicacid library formatted for prokaryotic expression. Transformants werecultured overnight in LB with chloramphenicol (34 μg/ml). Theovernight-culture was diluted to 1:500 in 1 ml fresh LB media with 1 mMIPTG and chloramphenicol to induce ZFP expression. After a three hourincubation at 30° C., hexane was added to 1.5% and rapidly vortexed tomake emulsion of hexane and E. coli culture. The mixture was incubatedfor three hours with shaking (250 rpm) at 37° C. and plated on LB plateswith chloramphenicol [g/ml (34 μg/ml). Plasmids were purified from thepool of growing colonies and transformed into DH5α The transformantswere treated with hexane as described above. Selection for hexanetolerance was repeated two additional times. Plasmids were recoveredfrom 20 individual colonies that could grow on LB plates withchloramphenicol (34 μg/ml) after the third round of selection. Theseplasmids were retransformed into DH5α. Each transformant was retestedfor hexane-tolerance as described above. Plasmids that induce hexanetolerance were sequenced to characterize the encoded zinc fingerprotein.

[0265] Three different zinc finger proteins were identified for theirability to confer hexane tolerance to E. coli cells. The amino acidsequences of each of these zinc finger proteins is depicted in Table 7.The sequence of each zinc finger domain of these proteins are listed inTable 1, rows 2-11. The finger motif sequences are depicted in Table 6.Hexane tolerance was evaluated by comparing the survival rate oftransformants expressing one of the zinc finger proteins—H1, H2, andH3—to the survival rate control cells. The control cells either includedan empty vector (C1) or ZFP-1. The ZFP-1 construct encodes a zinc fingerprotein that does not confer hexane resistance and that includes thefingers RDER-QSSR-DSKR. Bacterial cells that express hexaneresistance-conferring zinc finger proteins exhibited as much as a200-fold increase in hexane tolerance (Table 5, FIG. 1A). TABLE 5 HexaneResistant Zinc Finger Proteins. Expression Construct Name Survival RateControl C1 0.14% Control ZFP-1 0.05% Hexane resistance ZFP H1 21.4%Hexane resistance ZFP H2 1.85% Hexane resistance ZFP H3 28.6

[0266] TABLE 6 Zinc finger motif sequences and DNA target sequences ofproteins that confer hexane tolerance in E. coli No. of Name F1 F2 F3 F4putative DNA target occurrences(##) H1 RSHR HSSR ISNR GAH GTT GGG 5 H2QNTQ CSNR ISNR GAH GAV ATA 1 H3 ISNR RDHT QTHR1 VSTR GCT GRA NGG GAH 3(SEQ ID NO:

[0267] TABLE 7 Amino acid sequences of ZFP-TFs isolated from E. coliphenotype screening ZFP Amino acid Sequence SEQ ID NO: H1YKCMECGKAFNRRSHLTRHQRIHTGEKPFKCPVCGKAFRHSSSLVRHQRT 51 HTGEKPYRCKYCDRSFSISS NLQRHVRNIH H2YTCSYCGKSFTQSNTLKQHTRIHTGEKPYKCKQCGKAFGCPSNLRRHGRT 52HTGEKPYRCKYCDRSFSISS NLQRHVRNIH H3 YRCKYCDRSFSISSNLQRHVRNIHTGEKPFQCKTCQRKFS RSDHLKTHTR 53 THTGEKPYECHDCGKSFRQSTHLTRHRRIH TGEKPYECNYCGKTFSVSST LIRHQRIH

EXAMPLE 3 Thermo-Tolerant Bacterial Cells

[0268] We screened for zinc finger proteins that conferred heatresistance to cells. The nucleic acid library encoding different zincfinger proteins was transformed into E. coli cellsand cultured overnightin LB with chloramphenicol (34 μg/ml). The overnight-culture was dilutedto 1:500 in 1 ml fresh LB media with 1 μM IPTG and chloramphenicol (34μg/ml) to induce ZFP expression. After a 3 hour incubation at 30° C.,100 ul culture was transferred to micro-centrifuge tube and incubated inwater bath at 55° C. for 2 hrs. The culture was plated on LB plate withchloramphenicol (34 μg/ml). Plasmids were purified from the pool ofgrowing colonies and transformed into DH5α. Selection forthermotolerance was repeated with retransformants. Plasmid was purifiedfrom 30 individual colonies that could grow on LB+ chloramphenicol plate(34 μg/ml) after third round of selection and retransformed into DH5αEach transformant was analyzed for thermo-tolerance as described above.Plasmids that could induce thermo-tolerance were sequenced to identifyZFP.

[0269] Ten different zinc finger proteins were identified and theimprovement of thermo-tolerance was analyzed by comparing survival rateof ZFP transformants and control cells, C1 or ZFP-2 upon heat treatment.The amino acid sequences of each of these zinc finger proteins isdepicted in Table 9. The sequence of each zinc finger domain of theseproteins are listed in Table 1, rows 12-51. The finger motif sequencesare depicted in Table 8. C1 or ZFP-2 represent the transformants ofempty vector or a control ZFP that has no effect on thermotolerance(QTHQ-RSHR-QTHR1), respectively. More than 99.99% of wild type cellsdied upon heat treatment at 55° C. for 2 hours. In contrast, about 6% ofcells transformed with certain ZFP-TFs survived under these extremeconditions, a 700 fold increase in the thermotolerance phenotype—thatis, the percentage of cells expressing ZFP-TFs that survive under stressconditions (6.3%) divided by the percentage of C1 that survived underthe same conditions (0.0085%) (FIG. 1B). TABLE 8 ZFPs that conferthermotolerance. Name F1 F2 F3 F4 putative DNA target occurrences T-1QSHV VSNV QSNK QSNK 5′DAA DAA AAT HGA 3′ 6 (SEQ ID NO:150) T-2 RDHT QSHVQTHR1 QSSR1 5′GYA GRA HGA NGG K 3′ 3 (SEQ ID NO:151) T-3 WSNR QSHV VSNVQSHV 5′HGA AAT HGA GGT 3′ 1 (SEQ ID NO:152) T-4 QTHR1 RSHR QTHR1 QTHR15′GRA GRA GGG GRA 3′ 1 (SEQ ID NO:153) T-5 DSAR RDHT QSHV QTHR1 5′GRAHGA NGG GTC 3′ 2 (SEQ ID NO:154) T-6 QTHQ RSHR QTHR1 QTHR1 5′GRA GRA GGGHGA 3′ 1 (SEQ ID NO:155) T-7 QSHV VSNV QSNR1 CSNR1 5′GAV GAA AAT HGA 3′3 (SEQ ID NO:156) T-8 VSNV QTHR1 QSSR1 RDHT 5′NGG GYA GRA AAT 3′ 2 (SEQID NO:157) T-9 RDHT QSHV QTHR1 QSNR1 5′GAA GRA HGA NGG K 3′ 2 (SEQ IDNO:158) T-10 DSAR RDHT QSNK QTHR1 5′GRA DAA NGG GTC 3′ 2 (SEQ ID NO:159)

[0270] TABLE 9 Amino acid sequences of ZFP-TFs isolated from E. coliphenotype screening ZFP Amino acid SEQ ID NO: T1 YECDHCGKSF SQSSHLNVHKRTHTGEKPYE CDHCGKAFSV 54 SSNLNVHRRI HTGEKPYKCE ECGKAFTQSS NLTKHKKIHTGEKPYKCEEC GKAFTQSSNL TKHKKIH T2 FQCKTCQRKF SRSDHLKTHT RTHTGEKPYECDHCGKSFSQ 55 SSHLNVHKRT HTGEKPYECH DCGKSFRQST HLTRHRRIHT GEKPYKCPDCGKSFSQSSSL IRHQRTH T3 YRCEECGKAF RWPSNLTRHK RIHTGEKPYE CDHCGKSFSQ 56SSHLNVHKRT HTGEKPYECD HCGKAFSVSS NLNVHRRIHT GEKPYECDHC GKSFSQSSHLNVHKRTH T4 YECHDCGKSF RQSTHLTRHR RIHTGEKPYK CMECGKAFNR 57 RSHLTRHQRIHTGEKPYECH DCGKSFRQST HLTRHRRIHT GEKPYECHDC GKSFRQSTHL TRHRRIH T5FMCTWSYCGK RFTDRSALAR HKRTHTGEKP FQCKTCQRKF 58 SRSDHLKTHT RTHTGEKPYECDHCGKSFSQ SSHLNVHKRT HTGEKPYECH DCGKSFRQST HLTRHRRIH T6 YECHDCGKSFRQSTHLTQHR RIHTGEKPYK CMECGKAFNR 59 RSHLTRHQRI HTGEKPYECH DCGKSFRQSTHLTRHRRIHT GEKPYECHDC GKSFRQSTHL TRHRRIH T7 YECDHGGKSF SQSSHLNVHKRTHTGEKPYE CDHCGKAFSV 60 SSNLNVHRRI HTGEKPFECK DCGKAFIQKS NLIRHQRTHTGEKPYKCKQC GKAFGCPSNL RRHGRTH T8 YECDHCGKAF SVSSNLNVHR RIHTGEKPYECHDCGKSFRQ 61 STHLTRHRRI HTGEKPYKCP DCGKSFSQSS SLIRHQRTHT GEKPFQCKTCQRKFSRSDHL KTHTRTH T9 FQCKTCQRKF SRSDHLKTHT RTHTGEKPYE CDHCGKSFSQ 62SSHLNVHKRT HTGEKPYECH DCGKSFRQST HLTRHRRIHT GEKPFECKDC GKAFIQKSNLIRHQRTH T10 FMCTWSYCGK RFTDRSALAR HKRTHTGEKP FQCKTCQRKF 63 SRSDHLKTHTRTHTGEKPYK CEECGKAFTQ SSNLTKHKKI HTGEKPYECH DCGKSFRQST HLTRHRRIH

[0271] The T9 ZFP was further analyzed by site-directed mutagenesis ofan arginine residue critical for DNA binding to an alanine. The mutatedT9 ZFP (T9-M) failed to induce heat shock resistance in E. coli (FIG.1C), suggesting that the capability of T9 ZFP-TF to inducethermotolerance is dependent on the binding of ZFP to the target DNA.

EXAMPLE 4 Identification of ZFP Target Genes

[0272] A benefit of the ZFP approach, in contrast to chemical or UVmutagenesis, is that it allows for the identification andcharacterization of target gene associated with the improved phenotypebased on the expected binding sequences of ZFP.

[0273] A combined approach of chromatin immuno-precipitation and insilico prediction of binding sites of ZFP was undertaken to identifytarget genes of T9 ZFP that induce thermo-tolerance in E. coli. E. coligenomic DNA fragments that were cross-linked with T9 ZFP wereimmuno-precipitated by the modified chromatin immuno-precipitationmethod (Weinmann & Farnham, Methods. 26(1):37-47, 2002).

[0274] Briefly, E. coli cells were grown to an OD₆₀₀ of 1.0˜1.5 in 100ml LB medium containing chloramphenicol and 1 mM IPTG. Formaldehyde wasadded at a final concentration of 1% directly to medium. Fixationproceeded at room temperature with gentle swirling for 15 min and wasstopped by the addition of glycine to a final concentration of 0.125 M.Cells were harvested and washed twice with phosphate buffer. Cells wereresuspended in buffer (150 mM NaCl, 50 mM HEPES/KOH pH7.5, 1 mM EDTA,10% glycerol, 0.1% NP40, 0.17 mM PMSF, protease inhibitor cocktail, 100μg/ml lysozyme) and sonicated. The solution was centrifuged and thesupernatant was precleared with the addition of 50 μl of protein A beadsand 50 μg of carrier DNA for 1 hour at 4° C. Precleared genomic DNA wasincubated with 5 μl (1:100, vol/vol) anti-V5 monoclonal antibody(Invitrogen) or no antibody and rotated at 4° C. for 12-16 hours.Immuno-precipitation, washing and elution of immune complexes wascarried out twice as previously described (Weinmann & Farnham, Methods.26(1):37-47, 2002). Cross-links were reversed by the addition of NaCl toa final concentration of 200 mM, and RNA was removed by the addition of10 ug of RNase A per sample followed by incubation at 65° C. for 5hours. The samples were then precipitated at 20° C. overnight by theaddition of 2.5 volumes of ethanol and then pelleted by centrifugation.The pellet was resuspended in a solution of 10 mM EDTA, 30 mM Tris(pH6.5) and 60 mg/ml proteinase K. The samples were incubated at 50° C.for 30 min and extracted with phenol-choloroform-isoamylalcohol(25:24:1, vol/vol) followed by extraction with chloroform and thenprecipitated. The resuspended DNA was treated with T4 DNA polymerase tocreate blunt-ended DNA fragments and then cloned into a pUC19 vector(Invitrogen) digested with HincII.

[0275] After reversal of the formaldehyde cross-links and purificationof the DNA, the precipitated DNA fragments were cloned into vectors andsequenced to examine whether there were expected binding sequences of T9ZFP on the intergenic region from each clone. Of 200 clones sequenced, 6clones were identified that had perfectly or one-base mismatched bindingsequences of T9 ZFP, 5′-GAA GRA HGA NGG-3′ (SEQ ID NO: 160), on theirintergenic region. Since T9 ZFP was not fused with a functional domain,it was expected to function as a transcriptional repressor in E. coli(Kim and Pabo, J. Biol. Chem. 272(47):29795-800, 1997; Kang and Kim, J.Biol. Chem. 275(12):8742-8, 2000). To validate the functional relevanceof T9 ZFP with thermo-tolerance phenotype in E. coli, we knocked-outeach open reading frame associated with the 6 open reading frames havingT9 binding sequences and examined the response of the cells to heattreatment. Strain DY330 (W3110 DlacU169 gal490 lc1857 D (cro-bioA)) wasused for gene disruption by targeted homologous recombination (Yu etal., Proc Nat] Acad Sci USA. 97(11):5978-83, 2000). Linear cat (Cm^(R))cassette with 40-bp flanking arms of target gene was amplified by PCR.Purified linear donor DNA was introduced into competent cells byelectroporation and knock-out mutants were selected from growingcolonies on LB plate containing chloramphenicol.

[0276] One of the gene we disrupted was the UbiX gene, which encodes3-octaprenyl-4-hydroxybenzoate carboxy-lyase. The amino acid sequence ofthe UbiX gene product is shown in Table 10, below. TABLE 10 Amino acidsequence of UbiX gene product of Escherichia coli K12; also available inGenBank ®, GI No:1788650; Acc. No.:AAC75371.1; encoded by nucleotides2126-2695 in GenBank ® genomic entry AE000320.1.MKRLIVGISGASGAIYGVRLLQVLRDVTDIETHLVMSQAARQTLSLETDFSLREVQALA (SEQ IDNO:161) DVTHDARDIAASISSGSFQTLGMVILPCSIKTLSGIVHSYTDGLLTRAADVVLKERRPLVLCVRETPLHLGHLRLMTQAAEIGAVIMPPVPAFYHRPQSLDDVINQTVNRVLDQFAITLPE DLFARWQGA

[0277] The strain in which the UbiX gene (ubiX) was knocked-out showedheat shock resistance upon heat treatment at 55° C. for 2 hrs. Theeffect of heat treatment on the viability of ubiX strains is shown inFIG. 2A. Plates grown from cultures of heat-shocked ubiX cells displayedfar more colonies than plates grown from cultures of heat-shockedcontrol cells.

[0278] In normal conditions, the ubiX strain grew slowly and grew smallcolonies on plates as compared to wild type strains. However, the ubiXstrain was extremely resistant to the lethal effects of heat shock. Wecompared the survival rate of ubiX strain with wild type and T9 ZFPexpressing strains. Suvival was compared by calculating the number ofcells that survive under stress conditions divided by the number ofcells that survived under normal conditions (FIG. 2A, right panel). Thesurvival rate of ubiX and T9 strains after heat treatment was 0.42% and0.32%, respectively, whereas the survival rate of control strains was0.005%. To verify that the T9 ZFP was able to repress UbiX at the levelof transcription, we analyzed UbiX RNA levels of E. coli transformedwith T9 ZFP by RT-PCR. RNA was extracted with Trizol LS (Gibco BRL)according to the manufacturer's instructions. For the analysis of UbiXgene expression, complementary DNA synthesis was performed on RNA withUbiX-R primer (5′-CTG GAA AGA ACC GGA AGA GAT GCT G-3′) (SEQ ID NO:162). Real-time RT PCR was performed using a Light Cycler (CorbettResearch) with UbiX-F (5′-TGA AAC GAC TCA TTG TAG GCA TCA G-3′) (SEQ IDNO: 163) and UbiX-R primer sets. The RNA level of GAPDH was used as aninternal control.

[0279] As expected, levels of UbiX RNA decreased more than 2 fold uponT9 ZFP expression (FIG. 2B). The UbiX gene has one-base mismatchedbinding site of T9 ZFP at the position of −90 bp upstream oftranscriptional start codon. The in vivo binding of T9 ZFP to the targetsequences of UbiX promoter was confirmed by immuno-precipitation (FIG.2C). Combined results of in silico analysis, immuno-precipitation, geneknock-out mutation and transcriptional repression by T9 ZFP suggest thatUbiX is directly regulated by T9 ZFP and that moderate repression ofUbiX induces heat-shock resistance in E. coli.

[0280] UbiX functions in the biosynthesis of ubiquinone that is anessential redox component of the aerobic respiratory chains of bacteriaand mitochondria (Gennis and Stewart, Escherichia coli and Salmonella:Cellular and Molecular Biology, 2^(nd) ed., p.217-261, Neidhardt et al.,eds. Am Soc. Microbiol.). It has been reported that ubiquinone deficientstrain, ubiCA, exhibited resistant to heat (Soballe and Poole,Microbiol. 146:787-96, 2000). It is interesting to note that knock-downexpression of UbiX by ZFP, in contrast to knock-out mutation, couldinduce heat shock resistance without causing growth defects. This resultsuggests that moderate regulation of target gene expression can generatea desired phenotype in microbial engineering. ZFP library technology canbe used to regulate gene expression at a range of levels.

[0281] A number of embodiments of the invention have been described.Nevertheless, it will be understood that various modifications may bemade without departing from the spirit and scope of the invention.Accordingly, other embodiments are within the scope of the followingclaims.

1 164 1 23 PRT Homo sapiens 1 Tyr Lys Cys Met Glu Cys Gly Lys Ala PheAsn Arg Arg Ser His Leu 1 5 10 15 Thr Arg His Gln Arg Ile His 20 2 23PRT Homo sapiens 2 Phe Lys Cys Pro Val Cys Gly Lys Ala Phe Arg His SerSer Ser Leu 1 5 10 15 Val Arg His Gln Arg Thr His 20 3 24 PRT Homosapiens 3 Tyr Arg Cys Lys Tyr Cys Asp Arg Ser Phe Ser Ile Ser Ser AsnLeu 1 5 10 15 Gln Arg His Val Arg Asn Ile His 20 4 23 PRT Homo sapiens 4Tyr Thr Cys Ser Tyr Cys Gly Lys Ser Phe Thr Gln Ser Asn Thr Leu 1 5 1015 Lys Gln His Thr Arg Ile His 20 5 23 PRT Homo sapiens 5 Tyr Lys CysLys Gln Cys Gly Lys Ala Phe Gly Cys Pro Ser Asn Leu 1 5 10 15 Arg ArgHis Gly Arg Thr His 20 6 24 PRT Homo sapiens 6 Tyr Arg Cys Lys Tyr CysAsp Arg Ser Phe Ser Ile Ser Ser Asn Leu 1 5 10 15 Gln Arg His Val ArgAsn Ile His 20 7 24 PRT Homo sapiens 7 Tyr Arg Cys Lys Tyr Cys Asp ArgSer Phe Ser Ile Ser Ser Asn Leu 1 5 10 15 Gln Arg His Val Arg Asn IleHis 20 8 23 PRT Homo sapiens 8 Phe Gln Cys Lys Thr Cys Gln Arg Lys PheSer Arg Ser Asp His Leu 1 5 10 15 Lys Thr His Thr Arg Thr His 20 9 23PRT Homo sapiens 9 Tyr Glu Cys His Asp Cys Gly Lys Ser Phe Arg Gln SerThr His Leu 1 5 10 15 Thr Arg His Arg Arg Ile His 20 10 23 PRT Homosapiens 10 Tyr Glu Cys Asn Tyr Cys Gly Lys Thr Phe Ser Val Ser Ser ThrLeu 1 5 10 15 Ile Arg His Gln Arg Ile His 20 11 23 PRT Homo sapiens 11Tyr Glu Cys Asp His Cys Gly Lys Ser Phe Ser Gln Ser Ser His Leu 1 5 1015 Asn Val His Lys Arg Thr His 20 12 23 PRT Homo sapiens 12 Tyr Glu CysAsp His Cys Gly Lys Ala Phe Ser Val Ser Ser Asn Leu 1 5 10 15 Asn ValHis Arg Arg Ile His 20 13 23 PRT Homo sapiens 13 Tyr Lys Cys Glu Glu CysGly Lys Ala Phe Thr Gln Ser Ser Asn Leu 1 5 10 15 Thr Lys His Lys LysIle His 20 14 23 PRT Homo sapiens 14 Tyr Lys Cys Glu Glu Cys Gly Lys AlaPhe Thr Gln Ser Ser Asn Leu 1 5 10 15 Thr Lys His Lys Lys Ile His 20 1523 PRT Homo sapiens 15 Phe Gln Cys Lys Thr Cys Gln Arg Lys Phe Ser ArgSer Asp His Leu 1 5 10 15 Lys Thr His Thr Arg Thr His 20 16 23 PRT Homosapiens 16 Tyr Glu Cys Asp His Cys Gly Lys Ser Phe Ser Gln Ser Ser HisLeu 1 5 10 15 Asn Val His Lys Arg Thr His 20 17 23 PRT Homo sapiens 17Tyr Glu Cys His Asp Cys Gly Lys Ser Phe Arg Gln Ser Thr His Leu 1 5 1015 Thr Arg His Arg Arg Ile His 20 18 23 PRT Homo sapiens 18 Tyr Lys CysPro Asp Cys Gly Lys Ser Phe Ser Gln Ser Ser Ser Leu 1 5 10 15 Ile ArgHis Gln Arg Thr His 20 19 23 PRT Homo sapiens 19 Tyr Arg Cys Glu Glu CysGly Lys Ala Phe Arg Trp Pro Ser Asn Leu 1 5 10 15 Thr Arg His Lys ArgIle His 20 20 23 PRT Homo sapiens 20 Tyr Glu Cys Asp His Cys Gly Lys SerPhe Ser Gln Ser Ser His Leu 1 5 10 15 Asn Val His Lys Arg Thr His 20 2123 PRT Homo sapiens 21 Tyr Glu Cys Asp His Cys Gly Lys Ala Phe Ser ValSer Ser Asn Leu 1 5 10 15 Asn Val His Arg Arg Ile His 20 22 23 PRT Homosapiens 22 Tyr Glu Cys Asp His Cys Gly Lys Ser Phe Ser Gln Ser Ser HisLeu 1 5 10 15 Asn Val His Lys Arg Thr His 20 23 23 PRT Homo sapiens 23Tyr Glu Cys His Asp Cys Gly Lys Ser Phe Arg Gln Ser Thr His Leu 1 5 1015 Thr Arg His Arg Arg Ile His 20 24 23 PRT Homo sapiens 24 Tyr Lys CysMet Glu Cys Gly Lys Ala Phe Asn Arg Arg Ser His Leu 1 5 10 15 Thr ArgHis Gln Arg Ile His 20 25 23 PRT Homo sapiens 25 Tyr Glu Cys His Asp CysGly Lys Ser Phe Arg Gln Ser Thr His Leu 1 5 10 15 Thr Arg His Arg ArgIle His 20 26 23 PRT Homo sapiens 26 Tyr Glu Cys His Asp Cys Gly Lys SerPhe Arg Gln Ser Thr His Leu 1 5 10 15 Thr Arg His Arg Arg Ile His 20 2723 PRT Homo sapiens 27 Phe Met Cys Thr Trp Ser Tyr Cys Gly Lys Arg PheThr Asp Arg Ser 1 5 10 15 Ala Arg His Lys Arg Thr His 20 28 23 PRT Homosapiens 28 Phe Gln Cys Lys Thr Cys Gln Arg Lys Phe Ser Arg Ser Asp HisLeu 1 5 10 15 Lys Thr His Thr Arg Thr His 20 29 23 PRT Homo sapiens 29Tyr Glu Cys Asp His Cys Gly Lys Ser Phe Ser Gln Ser Ser His Leu 1 5 1015 Asn Val His Lys Arg Thr His 20 30 23 PRT Homo sapiens 30 Tyr Glu CysHis Asp Cys Gly Lys Ser Phe Arg Gln Ser Thr His Leu 1 5 10 15 Thr ArgHis Arg Arg Ile His 20 31 23 PRT Homo sapiens 31 Tyr Glu Cys His Asp CysGly Lys Ser Phe Arg Gln Ser Thr His Leu 1 5 10 15 Thr Gln His Arg ArgIle His 20 32 23 PRT Homo sapiens 32 Tyr Lys Cys Met Glu Cys Gly Lys AlaPhe Asn Arg Arg Ser His Leu 1 5 10 15 Thr Arg His Gln Arg Ile His 20 3323 PRT Homo sapiens 33 Tyr Glu Cys His Asp Cys Gly Lys Ser Phe Arg GlnSer Thr His Leu 1 5 10 15 Thr Arg His Arg Arg Ile His 20 34 23 PRT Homosapiens 34 Tyr Glu Cys His Asp Cys Gly Lys Ser Phe Arg Gln Ser Thr HisLeu 1 5 10 15 Thr Arg His Arg Arg Ile His 20 35 23 PRT Homo sapiens 35Tyr Glu Cys Asp His Cys Gly Lys Ser Phe Ser Gln Ser Ser His Leu 1 5 1015 Asn Val His Lys Arg Thr His 20 36 23 PRT Homo sapiens 36 Tyr Glu CysAsp His Cys Gly Lys Ala Phe Ser Val Ser Ser Asn Leu 1 5 10 15 Asn ValHis Arg Arg Ile His 20 37 23 PRT Homo sapiens 37 Phe Glu Cys Lys Asp CysGly Lys Ala Phe Ile Gln Lys Ser Asn Leu 1 5 10 15 Ile Arg His Gln ArgThr His 20 38 23 PRT Homo sapiens 38 Tyr Lys Cys Lys Gln Cys Gly Lys AlaPhe Gly Cys Pro Ser Asn Leu 1 5 10 15 Arg Arg His Gly Arg Thr His 20 3923 PRT Homo sapiens 39 Tyr Glu Cys Asp His Cys Gly Lys Ala Phe Ser ValSer Ser Asn Leu 1 5 10 15 Asn Val His Arg Arg Ile His 20 40 23 PRT Homosapiens 40 Tyr Glu Cys His Asp Cys Gly Lys Ser Phe Arg Gln Ser Thr HisLeu 1 5 10 15 Thr Arg His Arg Arg Ile His 20 41 23 PRT Homo sapiens 41Tyr Lys Cys Pro Asp Cys Gly Lys Ser Phe Ser Gln Ser Ser Ser Leu 1 5 1015 Ile Arg His Gln Arg Thr His 20 42 23 PRT Homo sapiens 42 Phe Gln CysLys Thr Cys Gln Arg Lys Phe Ser Arg Ser Asp His Leu 1 5 10 15 Lys ThrHis Thr Arg Thr His 20 43 23 PRT Homo sapiens 43 Phe Gln Cys Lys Thr CysGln Arg Lys Phe Ser Arg Ser Asp His Leu 1 5 10 15 Lys Thr His Thr ArgThr His 20 44 23 PRT Homo sapiens 44 Tyr Glu Cys Asp His Cys Gly Lys SerPhe Ser Gln Ser Ser His Leu 1 5 10 15 Asn Val His Lys Arg Thr His 20 4523 PRT Homo sapiens 45 Tyr Glu Cys His Asp Cys Gly Lys Ser Phe Arg GlnSer Thr His Leu 1 5 10 15 Thr Arg His Arg Arg Ile His 20 46 23 PRT Homosapiens 46 Phe Glu Cys Lys Asp Cys Gly Lys Ala Phe Ile Gln Lys Ser AsnLeu 1 5 10 15 Ile Arg His Gln Arg Thr His 20 47 23 PRT Homo sapiens 47Phe Met Cys Thr Trp Ser Tyr Cys Gly Lys Arg Phe Thr Asp Arg Ser 1 5 1015 Ala Arg His Lys Arg Thr His 20 48 23 PRT Homo sapiens 48 Phe Gln CysLys Thr Cys Gln Arg Lys Phe Ser Arg Ser Asp His Leu 1 5 10 15 Lys ThrHis Thr Arg Thr His 20 49 23 PRT Homo sapiens 49 Tyr Lys Cys Glu Glu CysGly Lys Ala Phe Thr Gln Ser Ser Asn Leu 1 5 10 15 Thr Lys His Lys LysIle His 20 50 23 PRT Homo sapiens 50 Tyr Glu Cys His Asp Cys Gly Lys SerPhe Arg Gln Ser Thr His Leu 1 5 10 15 Thr Arg His Arg Arg Ile His 20 5180 PRT Artificial Sequence Synthetically generated peptide 51 Tyr LysCys Met Glu Cys Gly Lys Ala Phe Asn Arg Arg Ser His Leu 1 5 10 15 ThrArg His Gln Arg Ile His Thr Gly Glu Lys Pro Phe Lys Cys Pro 20 25 30 ValCys Gly Lys Ala Phe Arg His Ser Ser Ser Leu Val Arg His Gln 35 40 45 ArgThr His Thr Gly Glu Lys Pro Tyr Arg Cys Lys Tyr Cys Asp Arg 50 55 60 SerPhe Ser Ile Ser Ser Asn Leu Gln Arg His Val Arg Asn Ile His 65 70 75 8052 80 PRT Artificial Sequence Synthetically generated peptide 52 Tyr ThrCys Ser Tyr Cys Gly Lys Ser Phe Thr Gln Ser Asn Thr Leu 1 5 10 15 LysGln His Thr Arg Ile His Thr Gly Glu Lys Pro Tyr Lys Cys Lys 20 25 30 GlnCys Gly Lys Ala Phe Gly Cys Pro Ser Asn Leu Arg Arg His Gly 35 40 45 ArgThr His Thr Gly Glu Lys Pro Tyr Arg Cys Lys Tyr Cys Asp Arg 50 55 60 SerPhe Ser Ile Ser Ser Asn Leu Gln Arg His Val Arg Asn Ile His 65 70 75 8053 108 PRT Artificial Sequence Synthetically generated peptide 53 TyrArg Cys Lys Tyr Cys Asp Arg Ser Phe Ser Ile Ser Ser Asn Leu 1 5 10 15Gln Arg His Val Arg Asn Ile His Thr Gly Glu Lys Pro Phe Gln Cys 20 25 30Lys Thr Cys Gln Arg Lys Phe Ser Arg Ser Asp His Leu Lys Thr His 35 40 45Thr Arg Thr His Thr Gly Glu Lys Pro Tyr Glu Cys His Asp Cys Gly 50 55 60Lys Ser Phe Arg Gln Ser Thr His Leu Thr Arg His Arg Arg Ile His 65 70 7580 Thr Gly Glu Lys Pro Tyr Glu Cys Asn Tyr Cys Gly Lys Thr Phe Ser 85 9095 Val Ser Ser Thr Leu Ile Arg His Gln Arg Ile His 100 105 54 107 PRTArtificial Sequence Synthetically generated peptide 54 Tyr Glu Cys AspHis Cys Gly Lys Ser Phe Ser Gln Ser Ser His Leu 1 5 10 15 Asn Val HisLys Arg Thr His Thr Gly Glu Lys Pro Tyr Glu Cys Asp 20 25 30 His Cys GlyLys Ala Phe Ser Val Ser Ser Asn Leu Asn Val His Arg 35 40 45 Arg Ile HisThr Gly Glu Lys Pro Tyr Lys Cys Glu Glu Cys Gly Lys 50 55 60 Ala Phe ThrGln Ser Ser Asn Leu Thr Lys His Lys Lys Ile His Thr 65 70 75 80 Gly GluLys Pro Tyr Lys Cys Glu Glu Cys Gly Lys Ala Phe Thr Gln 85 90 95 Ser SerAsn Leu Thr Lys His Lys Lys Ile His 100 105 55 107 PRT ArtificialSequence Synthetically generated peptide 55 Phe Gln Cys Lys Thr Cys GlnArg Lys Phe Ser Arg Ser Asp His Leu 1 5 10 15 Lys Thr His Thr Arg ThrHis Thr Gly Glu Lys Pro Tyr Glu Cys Asp 20 25 30 His Cys Gly Lys Ser PheSer Gln Ser Ser His Leu Asn Val His Lys 35 40 45 Arg Thr His Thr Gly GluLys Pro Tyr Glu Cys His Asp Cys Gly Lys 50 55 60 Ser Phe Arg Gln Ser ThrHis Leu Thr Arg His Arg Arg Ile His Thr 65 70 75 80 Gly Glu Lys Pro TyrLys Cys Pro Asp Cys Gly Lys Ser Phe Ser Gln 85 90 95 Ser Ser Ser Leu IleArg His Gln Arg Thr His 100 105 56 107 PRT Artificial SequenceSynthetically generated peptide 56 Tyr Arg Cys Glu Glu Cys Gly Lys AlaPhe Arg Trp Pro Ser Asn Leu 1 5 10 15 Thr Arg His Lys Arg Ile His ThrGly Glu Lys Pro Tyr Glu Cys Asp 20 25 30 His Cys Gly Lys Ser Phe Ser GlnSer Ser His Leu Asn Val His Lys 35 40 45 Arg Thr His Thr Gly Glu Lys ProTyr Glu Cys Asp His Cys Gly Lys 50 55 60 Ala Phe Ser Val Ser Ser Asn LeuAsn Val His Arg Arg Ile His Thr 65 70 75 80 Gly Glu Lys Pro Tyr Glu CysAsp His Cys Gly Lys Ser Phe Ser Gln 85 90 95 Ser Ser His Leu Asn Val HisLys Arg Thr His 100 105 57 107 PRT Artificial Sequence Syntheticallygenerated peptide 57 Tyr Glu Cys His Asp Cys Gly Lys Ser Phe Arg Gln SerThr His Leu 1 5 10 15 Thr Arg His Arg Arg Ile His Thr Gly Glu Lys ProTyr Lys Cys Met 20 25 30 Glu Cys Gly Lys Ala Phe Asn Arg Arg Ser His LeuThr Arg His Gln 35 40 45 Arg Ile His Thr Gly Glu Lys Pro Tyr Glu Cys HisAsp Cys Gly Lys 50 55 60 Ser Phe Arg Gln Ser Thr His Leu Thr Arg His ArgArg Ile His Thr 65 70 75 80 Gly Glu Lys Pro Tyr Glu Cys His Asp Cys GlyLys Ser Phe Arg Gln 85 90 95 Ser Thr His Leu Thr Arg His Arg Arg Ile His100 105 58 107 PRT Artificial Sequence Synthetically generated peptide58 Phe Met Cys Thr Trp Ser Tyr Cys Gly Lys Arg Phe Thr Asp Arg Ser 1 510 15 Ala Arg His Lys Arg Thr His Thr Gly Glu Lys Pro Phe Gln Cys Lys 2025 30 Thr Cys Gln Arg Lys Phe Ser Arg Ser Asp His Leu Lys Thr His Thr 3540 45 Arg Thr His Thr Gly Glu Lys Pro Tyr Glu Cys Asp His Cys Gly Lys 5055 60 Ser Phe Ser Gln Ser Ser His Leu Asn Val His Lys Arg Thr His Thr 6570 75 80 Gly Glu Lys Pro Tyr Glu Cys His Asp Cys Gly Lys Ser Phe Arg Gln85 90 95 Ser Thr His Leu Thr Arg His Arg Arg Ile His 100 105 59 107 PRTArtificial Sequence Synthetically generated peptide 59 Tyr Glu Cys HisAsp Cys Gly Lys Ser Phe Arg Gln Ser Thr His Leu 1 5 10 15 Thr Gln HisArg Arg Ile His Thr Gly Glu Lys Pro Tyr Lys Cys Met 20 25 30 Glu Cys GlyLys Ala Phe Asn Arg Arg Ser His Leu Thr Arg His Gln 35 40 45 Arg Ile HisThr Gly Glu Lys Pro Tyr Glu Cys His Asp Cys Gly Lys 50 55 60 Ser Phe ArgGln Ser Thr His Leu Thr Arg His Arg Arg Ile His Thr 65 70 75 80 Gly GluLys Pro Tyr Glu Cys His Asp Cys Gly Lys Ser Phe Arg Gln 85 90 95 Ser ThrHis Leu Thr Arg His Arg Arg Ile His 100 105 60 107 PRT ArtificialSequence Synthetically generated peptide 60 Tyr Glu Cys Asp His Cys GlyLys Ser Phe Ser Gln Ser Ser His Leu 1 5 10 15 Asn Val His Lys Arg ThrHis Thr Gly Glu Lys Pro Tyr Glu Cys Asp 20 25 30 His Cys Gly Lys Ala PheSer Val Ser Ser Asn Leu Asn Val His Arg 35 40 45 Arg Ile His Thr Gly GluLys Pro Phe Glu Cys Lys Asp Cys Gly Lys 50 55 60 Ala Phe Ile Gln Lys SerAsn Leu Ile Arg His Gln Arg Thr His Thr 65 70 75 80 Gly Glu Lys Pro TyrLys Cys Lys Gln Cys Gly Lys Ala Phe Gly Cys 85 90 95 Pro Ser Asn Leu ArgArg His Gly Arg Thr His 100 105 61 107 PRT Artificial SequenceSynthetically generated peptide 61 Tyr Glu Cys Asp His Cys Gly Lys AlaPhe Ser Val Ser Ser Asn Leu 1 5 10 15 Asn Val His Arg Arg Ile His ThrGly Glu Lys Pro Tyr Glu Cys His 20 25 30 Asp Cys Gly Lys Ser Phe Arg GlnSer Thr His Leu Thr Arg His Arg 35 40 45 Arg Ile His Thr Gly Glu Lys ProTyr Lys Cys Pro Asp Cys Gly Lys 50 55 60 Ser Phe Ser Gln Ser Ser Ser LeuIle Arg His Gln Arg Thr His Thr 65 70 75 80 Gly Glu Lys Pro Phe Gln CysLys Thr Cys Gln Arg Lys Phe Ser Arg 85 90 95 Ser Asp His Leu Lys Thr HisThr Arg Thr His 100 105 62 107 PRT Artificial Sequence Syntheticallygenerated peptide 62 Phe Gln Cys Lys Thr Cys Gln Arg Lys Phe Ser Arg SerAsp His Leu 1 5 10 15 Lys Thr His Thr Arg Thr His Thr Gly Glu Lys ProTyr Glu Cys Asp 20 25 30 His Cys Gly Lys Ser Phe Ser Gln Ser Ser His LeuAsn Val His Lys 35 40 45 Arg Thr His Thr Gly Glu Lys Pro Tyr Glu Cys HisAsp Cys Gly Lys 50 55 60 Ser Phe Arg Gln Ser Thr His Leu Thr Arg His ArgArg Ile His Thr 65 70 75 80 Gly Glu Lys Pro Phe Glu Cys Lys Asp Cys GlyLys Ala Phe Ile Gln 85 90 95 Lys Ser Asn Leu Ile Arg His Gln Arg Thr His100 105 63 107 PRT Artificial Sequence Synthetically generated peptide63 Phe Met Cys Thr Trp Ser Tyr Cys Gly Lys Arg Phe Thr Asp Arg Ser 1 510 15 Ala Arg His Lys Arg Thr His Thr Gly Glu Lys Pro Phe Gln Cys Lys 2025 30 Thr Cys Gln Arg Lys Phe Ser Arg Ser Asp His Leu Lys Thr His Thr 3540 45 Arg Thr His Thr Gly Glu Lys Pro Tyr Lys Cys Glu Glu Cys Gly Lys 5055 60 Ala Phe Thr Gln Ser Ser Asn Leu Thr Lys His Lys Lys Ile His Thr 6570 75 80 Gly Glu Lys Pro Tyr Glu Cys His Asp Cys Gly Lys Ser Phe Arg Gln85 90 95 Ser Thr His Leu Thr Arg His Arg Arg Ile His 100 105 64 13 PRTSimian parainfluenza virus 5 64 Gly Lys Pro Ile Pro Asn Pro Leu Leu GlyLeu Asp Ser 1 5 10 65 89 PRT Artificial Sequence Synthetically generatedpeptide 65 Glu Arg Pro Tyr Ala Cys Pro Val Glu Ser Cys Asp Arg Arg PheSer 1 5 10 15 Arg Ser Asp Glu Leu Thr Arg His Ile Arg Ile His Thr GlyGln Lys 20 25 30 Pro Phe Gln Cys Arg Ile Cys Met Arg Asn Phe Ser Arg SerAsp His 35 40 45 Leu Thr Thr His Ile Arg Thr His Thr Gly Glu Lys Pro PheAla Cys 50 55 60 Asp Ile Cys Gly Arg Lys Phe Ala Arg Ser Asp Glu Arg LysArg His 65 70 75 80 Thr Lys Ile His Leu Arg Gln Lys Asp 85 66 21 PRTArtificial Sequence Synthetically generated peptide 66 Xaa Xaa Cys XaaXaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa His Xaa 1 5 10 15 Xaa Xaa XaaXaa His 20 67 23 PRT Homo sapiens 67 Tyr Lys Cys Lys Gln Cys Gly Lys AlaPhe Gly Cys Pro Ser Asn Leu 1 5 10 15 Arg Arg His Gly Arg Thr His 20 6823 PRT Homo sapiens 68 Tyr Gln Cys Asn Ile Cys Gly Lys Cys Phe Ser CysAsn Ser Asn Leu 1 5 10 15 His Arg His Gln Arg Thr His 20 69 23 PRT Homosapiens 69 Tyr Ser Cys Gly Ile Cys Gly Lys Ser Phe Ser Asp Ser Ser AlaLys 1 5 10 15 Arg Arg His Cys Ile Leu His 20 70 23 PRT Homo sapiens 70Tyr Thr Cys Ser Asp Cys Gly Lys Ala Phe Arg Asp Lys Ser Cys Leu 1 5 1015 Asn Arg His Arg Arg Thr His 20 71 23 PRT Homo sapiens 71 Tyr Lys CysLys Glu Cys Gly Lys Ala Phe Asn His Ser Ser Asn Phe 1 5 10 15 Asn LysHis His Arg Ile His 20 72 23 PRT Homo sapiens 72 Phe Lys Cys Pro Val CysGly Lys Ala Phe Arg His Ser Ser Ser Leu 1 5 10 15 Val Arg His Gln ArgThr His 20 73 24 PRT Homo sapiens 73 Tyr Arg Cys Lys Tyr Cys Asp Arg SerPhe Ser Ile Ser Ser Asn Leu 1 5 10 15 Gln Arg His Val Arg Asn Ile His 2074 23 PRT Homo sapiens 74 Tyr Glu Cys Asp His Cys Gly Lys Ala Phe SerIle Gly Ser Asn Leu 1 5 10 15 Asn Val His Arg Arg Ile His 20 75 23 PRTHomo sapiens 75 Tyr Gly Cys His Leu Cys Gly Lys Ala Phe Ser Lys Ser SerAsn Leu 1 5 10 15 Arg Arg His Glu Met Ile His 20 76 23 PRT Homo sapiens76 Tyr Lys Cys Lys Glu Cys Gly Gln Ala Phe Arg Gln Arg Ala His Leu 1 510 15 Ile Arg His His Lys Leu His 20 77 23 PRT Homo sapiens 77 Tyr LysCys His Gln Cys Gly Lys Ala Phe Ile Gln Ser Phe Asn Leu 1 5 10 15 ArgArg His Glu Arg Thr His 20 78 23 PRT Homo sapiens 78 Phe Gln Cys Asn GlnCys Gly Ala Ser Phe Thr Gln Lys Gly Asn Leu 1 5 10 15 Leu Arg His IleLys Leu His 20 79 23 PRT Homo sapiens 79 Tyr Ala Cys His Leu Cys Gly LysAla Phe Thr Gln Ser Ser His Leu 1 5 10 15 Arg Arg His Glu Lys Thr His 2080 23 PRT Homo sapiens 80 Tyr Lys Cys Gly Gln Cys Gly Lys Phe Tyr SerGln Val Ser His Leu 1 5 10 15 Thr Arg His Gln Lys Ile His 20 81 23 PRTHomo sapiens 81 Tyr Ala Cys His Leu Cys Gly Lys Ala Phe Thr Gln Cys SerHis Leu 1 5 10 15 Arg Arg His Glu Lys Thr His 20 82 23 PRT Homo sapiens82 Tyr Ala Cys His Leu Cys Ala Lys Ala Phe Ile Gln Cys Ser His Leu 1 510 15 Arg Arg His Glu Lys Thr His 20 83 23 PRT Homo sapiens 83 Tyr ValCys Arg Glu Cys Gly Arg Gly Phe Arg Gln His Ser His Leu 1 5 10 15 ValArg His Lys Arg Thr His 20 84 23 PRT Homo sapiens 84 Tyr Lys Cys Glu GluCys Gly Lys Ala Phe Arg Gln Ser Ser His Leu 1 5 10 15 Thr Thr His LysIle Ile His 20 85 23 PRT Homo sapiens 85 Tyr Glu Cys Asp His Cys Gly LysSer Phe Ser Gln Ser Ser His Leu 1 5 10 15 Asn Val His Lys Arg Thr His 2086 23 PRT Homo sapiens 86 Tyr Met Cys Ser Glu Cys Gly Arg Gly Phe SerGln Lys Ser Asn Leu 1 5 10 15 Ile Ile His Gln Arg Thr His 20 87 23 PRTHomo sapiens 87 Tyr Lys Cys Glu Glu Cys Gly Lys Ala Phe Thr Gln Ser SerAsn Leu 1 5 10 15 Thr Lys His Lys Lys Ile His 20 88 23 PRT Homo sapiens88 Phe Glu Cys Lys Asp Cys Gly Lys Ala Phe Ile Gln Lys Ser Asn Leu 1 510 15 Ile Arg His Gln Arg Thr His 20 89 23 PRT Homo sapiens 89 Tyr ValCys Arg Glu Cys Arg Arg Gly Phe Ser Gln Lys Ser Asn Leu 1 5 10 15 IleArg His Gln Arg Thr His 20 90 23 PRT Homo sapiens 90 Tyr Glu Cys Glu LysCys Gly Lys Ala Phe Asn Gln Ser Ser Asn Leu 1 5 10 15 Thr Arg His LysLys Ser His 20 91 23 PRT Homo sapiens 91 Tyr Glu Cys Asn Thr Cys Arg LysThr Phe Ser Gln Lys Ser Asn Leu 1 5 10 15 Ile Val His Gln Arg Thr His 2092 23 PRT Homo sapiens 92 Tyr Val Cys Ser Lys Cys Gly Lys Ala Phe ThrGln Ser Ser Asn Leu 1 5 10 15 Thr Val His Gln Lys Ile His 20 93 23 PRTHomo sapiens 93 Tyr Lys Cys Asp Glu Cys Gly Lys Asn Phe Thr Gln Ser SerAsn Leu 1 5 10 15 Ile Val His Lys Arg Ile His 20 94 23 PRT Homo sapiens94 Tyr Glu Cys Asp Val Cys Gly Lys Thr Phe Thr Gln Lys Ser Asn Leu 1 510 15 Gly Val His Gln Arg Thr His 20 95 23 PRT Homo sapiens 95 Tyr GluCys Val Gln Cys Gly Lys Gly Phe Thr Gln Ser Ser Asn Leu 1 5 10 15 IleThr His Gln Arg Val His 20 96 23 PRT Homo sapiens 96 Tyr Lys Cys Pro AspCys Gly Lys Ser Phe Ser Gln Ser Ser Ser Leu 1 5 10 15 Ile Arg His GlnArg Thr His 20 97 23 PRT Homo sapiens 97 Tyr Glu Cys Gln Asp Cys Gly ArgAla Phe Asn Gln Asn Ser Ser Leu 1 5 10 15 Gly Arg His Lys Arg Thr His 2098 23 PRT Homo sapiens 98 Tyr Glu Cys Asn Glu Cys Gly Lys Phe Phe SerGln Ser Ser Ser Leu 1 5 10 15 Ile Arg His Arg Arg Ser His 20 99 23 PRTHomo sapiens 99 Tyr Lys Cys Glu Glu Cys Gly Lys Ala Phe Asn Gln Ser SerThr Leu 1 5 10 15 Thr Arg His Lys Ile Val His 20 100 23 PRT Homo sapiens100 Tyr Glu Cys Asn Glu Cys Gly Lys Ala Phe Ala Gln Asn Ser Thr Leu 1 510 15 Arg Val His Gln Arg Ile His 20 101 23 PRT Homo sapiens 101 Tyr GluCys His Asp Cys Gly Lys Ser Phe Arg Gln Ser Thr His Leu 1 5 10 15 ThrGln His Arg Arg Ile His 20 102 23 PRT Homo sapiens 102 Tyr Glu Cys HisAsp Cys Gly Lys Ser Phe Arg Gln Ser Thr His Leu 1 5 10 15 Thr Arg HisArg Arg Ile His 20 103 22 PRT Homo sapiens 103 His Lys Cys Leu Glu CysGly Lys Cys Phe Ser Gln Asn Thr His Leu 1 5 10 15 Thr Arg His Gln ArgThr 20 104 25 PRT Homo sapiens 104 Tyr Val Cys Asp Val Glu Gly Cys ThrTrp Lys Phe Ala Arg Ser Asp 1 5 10 15 Glu Leu Asn Arg His Lys Lys ArgHis 20 25 105 25 PRT Homo sapiens 105 Tyr His Cys Asp Trp Asp Gly CysGly Trp Lys Phe Ala Arg Ser Asp 1 5 10 15 Glu Leu Thr Arg His Tyr ArgLys His 20 25 106 25 PRT Homo sapiens 106 Tyr Arg Cys Ser Trp Glu GlyCys Glu Trp Arg Phe Ala Arg Ser Asp 1 5 10 15 Glu Leu Thr Arg His PheArg Lys His 20 25 107 25 PRT Homo sapiens 107 Phe Ser Cys Ser Trp LysGly Cys Glu Arg Arg Phe Ala Arg Ser Asp 1 5 10 15 Glu Leu Ser Arg HisArg Arg Thr His 20 25 108 25 PRT Homo sapiens 108 Phe Ala Cys Ser TrpGln Asp Cys Asn Lys Lys Phe Ala Arg Ser Asp 1 5 10 15 Glu Leu Ala ArgHis Tyr Arg Thr His 20 25 109 25 PRT Homo sapiens 109 Tyr His Cys AsnTrp Asp Gly Cys Gly Trp Lys Phe Ala Arg Ser Asp 1 5 10 15 Glu Leu ThrArg His Tyr Arg Lys His 20 25 110 24 PRT Homo sapiens 110 Phe Leu CysGln Tyr Cys Ala Gln Arg Phe Gly Arg Lys Asp His Leu 1 5 10 15 Thr ArgHis Met Lys Lys Ser His 20 111 23 PRT Homo sapiens 111 Phe Gln Cys LysThr Cys Gln Arg Lys Phe Ser Arg Ser Asp His Leu 1 5 10 15 Lys Thr HisThr Arg Thr His 20 112 23 PRT Homo sapiens 112 Phe Ala Cys Glu Val CysGly Val Arg Phe Thr Arg Asn Asp Lys Leu 1 5 10 15 Lys Ile His Met ArgLys His 20 113 25 PRT Homo sapiens 113 Tyr Val Cys Asp Val Glu Gly CysThr Trp Lys Phe Ala Arg Ser Asp 1 5 10 15 Lys Leu Asn Arg His Lys LysArg His 20 25 114 23 PRT Homo sapiens 114 Tyr Lys Cys Met Glu Cys GlyLys Ala Phe Asn Arg Arg Ser His Leu 1 5 10 15 Thr Arg His Gln Arg IleHis 20 115 23 PRT Homo sapiens 115 Tyr Ile Cys Arg Lys Cys Gly Arg GlyPhe Ser Arg Lys Ser Asn Leu 1 5 10 15 Ile Arg His Gln Arg Thr His 20 11623 PRT Homo sapiens 116 Tyr Leu Cys Ser Glu Cys Asp Lys Cys Phe Ser ArgSer Thr Asn Leu 1 5 10 15 Ile Arg His Arg Arg Thr His 20 117 23 PRT Homosapiens 117 Tyr Glu Cys Lys Glu Cys Gly Lys Ala Phe Ser Ser Gly Ser AsnPhe 1 5 10 15 Thr Arg His Gln Arg Ile His 20 118 23 PRT Homo sapiens 118Tyr Glu Cys Asp His Cys Gly Lys Ala Phe Ser Val Ser Ser Asn Leu 1 5 1015 Asn Val His Arg Arg Ile His 20 119 23 PRT Homo sapiens 119 Tyr ThrCys Lys Gln Cys Gly Lys Ala Phe Ser Val Ser Ser Ser Leu 1 5 10 15 ArgArg His Glu Thr Thr His 20 120 23 PRT Homo sapiens 120 Tyr Glu Cys AsnTyr Cys Gly Lys Thr Phe Ser Val Ser Ser Thr Leu 1 5 10 15 Ile Arg HisGln Arg Ile His 20 121 23 PRT Homo sapiens 121 Tyr Arg Cys Glu Glu CysGly Lys Ala Phe Arg Trp Pro Ser Asn Leu 1 5 10 15 Thr Arg His Lys ArgIle His 20 122 6 PRT Artificial Sequence Naturally occurring linkerpeptide 122 Thr Gly Xaa Xaa Pro Xaa 1 5 123 26 PRT Artificial SequenceSynthetically generated peptide 123 Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa CysXaa Xaa Xaa Cys Xaa Ser Asn 1 5 10 15 Xaa Xaa Arg His Xaa Xaa Xaa XaaXaa His 20 25 124 267 DNA Artificial Sequence Synthetically generatedoligonucleotide 124 atcgataagc taattctcac tcattaggca ccccaggctttacactttat gcttccggct 60 cgtataatgt gtggaattgt gagcggataa caatttcacacaggaaacag cgtccatggg 120 taagcctatc cctaaccctc tcctcggtct cgattctacacaagctatgg gtgctcctcc 180 aaaaaagaag agaaaggtag ctggatccac tagtaacggccgccagtgtg ctggaattct 240 gcagatatcc atcacactgg cggccgc 267 125 25 PRTArtificial Sequence mutated sequence 125 Phe Met Cys Thr Trp Ser Tyr CysGly Lys Arg Phe Thr Asp Arg Ser 1 5 10 15 Ala Leu Ala Arg His Lys ArgThr His 20 25 126 23 PRT Homo sapiens 126 Tyr Lys Cys Lys Gln Cys GlyLys Ala Phe Gly Cys Pro Ser Asn Leu 1 5 10 15 Arg Arg His Gly Arg ThrHis 20 127 23 PRT Homo sapiens 127 Tyr Thr Cys Ser Asp Cys Gly Lys AlaPhe Arg Asp Lys Ser Cys Leu 1 5 10 15 Asn Arg His Arg Arg Thr His 20 12825 PRT Artificial Sequence mutated sequence 128 Tyr Ala Cys Pro Val GluSer Cys Asp Arg Arg Phe Ser Asp Ser Ser 1 5 10 15 Asn Leu Thr Arg HisIle Arg Ile His 20 25 129 23 PRT Homo sapiens 129 Phe Lys Cys Pro ValCys Gly Lys Ala Phe Arg His Ser Ser Ser Leu 1 5 10 15 Val Arg His GlnArg Thr His 20 130 24 PRT Homo sapiens 130 Tyr Arg Cys Lys Tyr Cys AspArg Ser Phe Ser Ile Ser Ser Asn Leu 1 5 10 15 Gln Arg His Val Arg AsnIle His 20 131 23 PRT Homo sapiens 131 Tyr Lys Cys His Gln Cys Gly LysAla Phe Ile Gln Ser Phe Asn Leu 1 5 10 15 Arg Arg His Glu Arg Thr His 20132 23 PRT Drosophila 132 Tyr Thr Cys Ser Tyr Cys Gly Lys Ser Phe ThrGln Ser Asn Thr Leu 1 5 10 15 Lys Gln His Thr Arg Ile His 20 133 23 PRTHomo sapiens 133 Tyr Glu Cys Asp His Cys Gly Lys Ser Phe Ser Gln Ser SerHis Leu 1 5 10 15 Asn Val His Lys Arg Thr His 20 134 23 PRT Homo sapiens134 Tyr Met Cys Ser Glu Cys Gly Arg Gly Phe Ser Gln Lys Ser Asn Leu 1 510 15 Ile Ile His Gln Arg Thr His 20 135 23 PRT Homo sapiens 135 Tyr LysCys Glu Glu Cys Gly Lys Ala Phe Thr Gln Ser Ser Asn Leu 1 5 10 15 ThrLys His Lys Lys Ile His 20 136 23 PRT Homo sapiens 136 Phe Glu Cys LysAsp Cys Gly Lys Ala Phe Ile Gln Lys Ser Asn Leu 1 5 10 15 Ile Arg HisGln Arg Thr His 20 137 23 PRT Homo sapiens 137 Tyr Val Cys Ser Lys CysGly Lys Ala Phe Thr Gln Ser Ser Asn Leu 1 5 10 15 Thr Val His Gln LysIle His 20 138 23 PRT Homo sapiens 138 Tyr Lys Cys Pro Asp Cys Gly LysSer Phe Ser Gln Ser Ser Ser Leu 1 5 10 15 Ile Arg His Gln Arg Thr His 20139 23 PRT Homo sapiens 139 Tyr Glu Cys His Asp Cys Gly Lys Ser Phe ArgGln Ser Thr His Leu 1 5 10 15 Thr Gln His Arg Arg Ile His 20 140 23 PRTHomo sapiens 140 Tyr Glu Cys His Asp Cys Gly Lys Ser Phe Arg Gln Ser ThrHis Leu 1 5 10 15 Thr Arg His Arg Arg Ile His 20 141 23 PRT Homo sapiens141 Phe Gln Cys Lys Thr Cys Gln Arg Lys Phe Ser Arg Ser Asp His Leu 1 510 15 Lys Thr His Thr Arg Thr His 20 142 25 PRT Homo sapiens 142 Tyr ValCys Asp Val Glu Gly Cys Thr Trp Lys Phe Ala Arg Ser Asp 1 5 10 15 LysLeu Asn Arg His Lys Lys Arg His 20 25 143 23 PRT Artificial Sequencemutated sequence 143 Phe Ala Cys Pro Glu Cys Pro Lys Arg Phe Met Arg SerAsp Asn Leu 1 5 10 15 Thr Gln His Ile Lys Thr His 20 144 23 PRT Homosapiens 144 Tyr Lys Cys Met Glu Cys Gly Lys Ala Phe Asn Arg Arg Ser HisLeu 1 5 10 15 Thr Arg His Gln Arg Ile His 20 145 23 PRT Homo sapiens 145Tyr Ile Cys Arg Lys Cys Gly Arg Gly Phe Ser Arg Lys Ser Asn Leu 1 5 1015 Ile Arg His Gln Arg Thr His 20 146 23 PRT Homo sapiens 146 Tyr GluCys Asp His Cys Gly Lys Ala Phe Ser Val Ser Ser Asn Leu 1 5 10 15 AsnVal His Arg Arg Ile His 20 147 23 PRT Homo sapiens 147 Tyr Thr Cys LysGln Cys Gly Lys Ala Phe Ser Val Ser Ser Ser Leu 1 5 10 15 Arg Arg HisGlu Thr Thr His 20 148 23 PRT Homo sapiens 148 Tyr Glu Cys Asn Tyr CysGly Lys Thr Phe Ser Val Ser Ser Thr Leu 1 5 10 15 Ile Arg His Gln ArgIle His 20 149 23 PRT Homo sapiens 149 Tyr Arg Cys Glu Glu Cys Gly LysAla Phe Arg Trp Pro Ser Asn Leu 1 5 10 15 Thr Arg His Lys Arg Ile His 20150 12 DNA Artificial Sequence putative target sequence 150 daadaaaathga 12 151 13 DNA Artificial Sequence putative target sequence 151gyagrahgan ggk 13 152 12 DNA Artificial Sequence putative targetsequence 152 hgaaathgag gt 12 153 12 DNA Artificial Sequence putativetarget sequence 153 gragragggg ra 12 154 12 DNA Artificial Sequenceputative target sequence 154 grahganggg tc 12 155 12 DNA ArtificialSequence putative target sequence 155 gragragggh ga 12 156 12 DNAArtificial Sequence putative target sequence 156 gavgaaaath ga 12 157 12DNA Artificial Sequence putative target sequence 157 ngggyagraa at 12158 13 DNA Artificial Sequence putative target sequence 158 gaagrahganggk 13 159 12 DNA Artificial Sequence putative target sequence 159gradaanggg tc 12 160 12 DNA Artificial Sequence binding sequence 160gaagrahgan gg 12 161 189 PRT Escherichia coli 161 Met Lys Arg Leu IleVal Gly Ile Ser Gly Ala Ser Gly Ala Ile Tyr 1 5 10 15 Gly Val Arg LeuLeu Gln Val Leu Arg Asp Val Thr Asp Ile Glu Thr 20 25 30 His Leu Val MetSer Gln Ala Ala Arg Gln Thr Leu Ser Leu Glu Thr 35 40 45 Asp Phe Ser LeuArg Glu Val Gln Ala Leu Ala Asp Val Thr His Asp 50 55 60 Ala Arg Asp IleAla Ala Ser Ile Ser Ser Gly Ser Phe Gln Thr Leu 65 70 75 80 Gly Met ValIle Leu Pro Cys Ser Ile Lys Thr Leu Ser Gly Ile Val 85 90 95 His Ser TyrThr Asp Gly Leu Leu Thr Arg Ala Ala Asp Val Val Leu 100 105 110 Lys GluArg Arg Pro Leu Val Leu Cys Val Arg Glu Thr Pro Leu His 115 120 125 LeuGly His Leu Arg Leu Met Thr Gln Ala Ala Glu Ile Gly Ala Val 130 135 140Ile Met Pro Pro Val Pro Ala Phe Tyr His Arg Pro Gln Ser Leu Asp 145 150155 160 Asp Val Ile Asn Gln Thr Val Asn Arg Val Leu Asp Gln Phe Ala Ile165 170 175 Thr Leu Pro Glu Asp Leu Phe Ala Arg Trp Gln Gly Ala 180 185162 25 DNA Artificial Sequence primer 162 ctggaaagaa ccggaagaga tgctg 25163 25 DNA Artificial Sequence primer 163 tgaaacgact cattgtaggc atcag 25164 12 DNA Artificial Sequence target sequence 164 gctgranggg ah 12

What is claimed is:
 1. A method of regulating expression of a gene in aprokaryotic cell, the method comprising: providing a prokaryotic cellcomprising a nucleic acid encoding an artificial polypeptide, whereinthe artificial polypeptide comprises a zinc finger domain, and whereinthe artificial polypeptide binds to a target DNA site in a gene;expressing the nucleic acid encoding the artificial polypeptide in thecell under conditions in which the artificial polypeptide is produced,binds to the target DNA site, and regulates the gene.
 2. The method ofclaim 1, wherein the artificial polypeptide comprises at least threezinc finger domains.
 3. The method of claim 1, wherein the gene is anendogenous gene.
 4. The method of claim 3, wherein expression of two ormore endogenous genes is regulated.
 5. The method of claim 4, whereinthe artificial polypeptide regulates expression of a polycistronic RNA.6. The method of claim 1, wherein expression of the gene is repressedrelative to expression of the gene in the absence of the artificialprotein.
 7. The method of claim 1, wherein the cell is an E. coli cell.8. The method of claim 1, wherein the regulating alters a trait of thecell relative to a reference cell.
 9. The method of claim 8, wherein thetrait is heat resistance or solvent resistance.
 10. The method of claim3, wherein the endogenous gene encodes a decarboxylase enzyme.
 11. Themethod of claim 10, wherein the decarboxylase enzyme is a decarboxylaseenzyme of a ubiquinone biosynthetic pathway.
 12. The method of claim 11,wherein the enzyme is a ubiX gene product.
 13. The method of claim 1,wherein expression of the nucleic acid encoding the artificialpolypeptide is regulatable.
 14. The method of claim 3, furthercomprising characterizing the endogenous gene.
 15. The method of claim14, wherein the characterizing comprises identifying DNA bound by theartificial polypeptide, and determining the nucleotide sequence of theendogenous gene associated with the bound DNA.
 16. The method of claim15, wherein the isolating comprises cross-linking the artificial proteinto the DNA, and immunoprecipitating the artificial protein.
 17. Themethod of claim 15, further comprising identifying a homolog of theendogenous gene in a second type of cell, and regulating the expressionof the homolog.
 18. The method of claim 17, wherein the second type ofcell is a prokaryotic cell.
 19. The method of claim 18, wherein thesecond type of cell is a bacterial cell.
 20. A method comprising:providing a plurality of prokaryotic cells, wherein each cell of theplurality comprises a nucleic acid encoding an artificial polypeptide,wherein the artificial polypeptide comprises a zinc finger domain, andwherein the artificial polypeptide differs among the cells of theplurality; identifying from the plurality a cell that has a trait thatis altered relative to a reference cell.
 21. The method of claim 20,wherein the trait is tolerance to an organic solvent, and wherein theidentifying comprises exposing cells of the plurality to the organicsolvent and evaluating survival of the cells.
 22. The method of claim20, wherein the trait is heat tolerance, and wherein the evaluatingcomprises exposing the cells to heat.
 23. The method of claim 20,further comprising isolating the nucleic acid encoding the artificialpolypeptide from the identified cell.
 24. The method of claim 23,further comprising sequencing the nucleic acid.
 25. The method of claim20, further comprising isolating the artificial polypeptide from theidentified cell.
 26. The method of claim 20, further comprisingisolating the nucleic acid encoding the artificial polypeptide from theidentified cell, introducing the nucleic acid into a second plurality ofcells, culturing the cells of the second plurality under conditionswherein the artificial polypeptide is produced, and identifying a cellof the second plurality having a trait that is altered relative to areference cell.
 27. The method of claim 20, further comprisingdetermining the sequence of the target DNA site of the artificialpolypeptide.
 28. The method of claim 20, further comprising identifyingan endogenous gene bound by the artificial polypeptide.
 29. The methodof claim 20, further comprising analyzing the expression of one or moregenes of the cell.
 30. The method of claim 28, further comprisingmodifying expression of the endogenous gene in a second cell.
 31. Themethod of claim 20, wherein the artificial polypeptide comprises atleast three zinc finger domains.
 32. The method of claim 31, wherein thezinc finger domains are yeast zinc finger domains, or variants thereof.33. The method of claim 20, further comprising cultivating theidentified cell to exploit the altered trait.
 34. A prokaryotic cellcomprising: a nucleic acid encoding an artificial polypeptide, whereinthe artificial polypeptide comprises a zinc finger domain, and whereinthe artificial polypeptide binds to a target DNA site in a gene andregulates expression of the gene under conditions in which the nucleicacid is expressed.
 35. The cell of claim 34, wherein the artificialpolypeptide regulates expression of an endogenous gene.
 36. The cell ofclaim 34, wherein the artificial polypeptide comprises at least threezinc finger domains.
 37. The cell of claim 35, wherein the gene is adecarboxylase.
 38. A cell selected by the method of claim
 20. 39. Apolypeptide comprising at least one zinc finger domain, wherein the DNAcontacting residues of the zinc finger domain at positions −1, +2, +3,and +6 correspond to a motif selected from: RSHR, HSSR, ISNR, RDHT,QTHR1, VSTR, QNTQ, and CSNR, and wherein the polypeptide regulates anendogenous prokaryotic gene.
 40. The polypeptide of claim 39, furthercomprising a second and third zinc finger domain, wherein the DNAcontacting residues of the first, second, and third domains at positions−1, +2, +3, and +6 of each domain respectively correspond to the motifsRSHR, HSSR, and ISNR.
 41. The polypeptide of claim 39, furthercomprising a second and third zinc finger domain, wherein the DNAcontacting residues of the first, second, and third domains at positions−1, +2, +3, and +6 of each domain respectively correspond to the motifsISNR, RDHT, and QTHR.
 42. The polypeptide of claim 41, furthercomprising a fourth zinc finger domain, wherein the DNA contactingresidues of the fourth domain at positions −1, +2, +3, and +6 ofcorrespond to the motif VSTR.
 43. The polypeptide of claim 39, furthercomprising a second and third zinc finger domain, wherein the DNAcontacting residues of the first, second, and third domains at positions−1, +2, +3, and +6 of each domain respectively correspond to the motifsQNTQ, CSNR, and ISNR.
 44. A polypeptide comprising at least one zincfinger domain, wherein the DNA contacting residues of the zinc fingerdomain at positions −1, +2, +3, and +6 correspond to a motif selectedfrom: QSHV, VSNV, QSNK, RDHT, QTHR, QSSR, WSNR, VSNV RSHR, DSAR, QTHQ,RSHR, QSNR, and CSNR, and wherein the polypeptide regulates anendogenous prokaryotic gene.
 45. The polypeptide of claim 44, furthercomprising a second, third, and fourth zinc finger domain, wherein theDNA contacting residues of the first, second, third, and fourth domainsat positions −1, +2, +3, and +6 of each domain respectively correspondto the motifs QSHV, VSNV, QSNK, and QSNK.
 46. The polypeptide of claim44, further comprising a second, third, and fourth zinc finger domain,wherein the DNA contacting residues of the first, second, third, andfourth domains at positions −1, +2, +3, and +6 of each domainrespectively correspond to the motifs RDHT, QSHV, QTHR, and QSSR. 47.The polypeptide of claim 44, further comprising a second, third, andfourth zinc finger domain, wherein the DNA contacting residues of thefirst, second, third, and fourth domains at positions −1, +2, +3, and +6of each domain respectively correspond to the motifs WSNR, QSHV, VSNV,and QSHV.
 48. The polypeptide of claim 44, further comprising a second,third, and fourth zinc finger domain, wherein the DNA contactingresidues of the first, second, third, and fourth domains at positions−1, +2, +3, and +6 of each domain respectively correspond to the motifsQTHR, RSHR, QTHR, and QTHR.
 49. The polypeptide of claim 44, furthercomprising a second, third, and fourth zinc finger domain, wherein theDNA contacting residues of the first, second, third, and fourth domainsat positions −1, +2, +3, and +6 of each domain respectively correspondto the motifs DSAR, RDHT, QSHV, and QTHR.
 50. The polypeptide of claim44, further comprising a second, third, and fourth zinc finger domain,wherein the DNA contacting residues of the first, second, third, andfourth domains at positions −1, +2, +3, and +6 of each domainrespectively correspond to the motifs QTHQ, RSHR, QTHR, and QTHR. 51.The polypeptide of claim 44, further comprising a second, third, andfourth zinc finger domain, wherein the DNA contacting residues of thefirst, second, third, and fourth domains at positions −1, +2, +3, and +6of each domain respectively correspond to the motifs QSHV, VSNV, QSNR,and CSNR.
 52. The polypeptide of claim 44, further comprising a second,third, and fourth zinc finger domain, wherein the DNA contactingresidues of the first, second, third, and fourth domains at positions−1, +2, +3, and +6 of each domain respectively correspond to the motifsVSNV, QTHR, QSSR, and RDHT.
 53. The polypeptide of claim 44, furthercomprising a second, third, and fourth zinc finger domain, wherein theDNA contacting residues of the first, second, third, and fourth domainsat positions −1, +2, +3, and +6 of each domain respectively correspondto the motifs RDHT, QSHV, QTHR, and QSNR.
 54. The polypeptide of claim44, further comprising a second, third, and fourth zinc finger domain,wherein the DNA contacting residues of the first, second, third, andfourth domains at positions −1, +2, +3, and +6 of each domainrespectively correspond to the motifs DSAR, RDHT, QSNK, and QTHR.
 55. Anucleic acid encoding the polypeptide of claim
 39. 56. A bacterialnucleic acid expression vector encoding the polypeptide of claim